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ABSTRACT 



This study investigates the accuracy of the 
Woodruff-Causey technique for estimating sampling errors for complex 
statistics . The technique may be applied when data are collected by 
using multistage clustered samples. The technique was chosen for 
study because of its relevance to the correct use of multivariate 
analyses in educational survey research. To apply the technique the 
researcher must be able to write Fortran subroutines and must be able 
to ascertain a sampling error formula for a mean for whatever 
sampling situation is to be used (i.e. look up one of the standard 
texts). In return the technique will provide an estimate of the 
sampling error for any statistic which can be expressed in terms of a 
Fortran subroutine. Guides to numerical differentiation for the 
technique, arid use of the technique and the writing of the Fortran 
subroutines are provided as appendixes to this paper. (PN) 
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INTRODUCTION 

When ah educational researcher conducts a survey it is almost always carried 
out in the administratively simple form of a clustered (and possibly 
weighted and stratified) sample of schools and classes. If the analysis of 
the data collected in this way is confined to means and differences between 
means, then the sampling variability, Which is crucial to inference and to 
a complete understanding of the results, may be found using formulae avail- 
able in the standard texts (for example, Cochran, 1963, and kish, 1965). 
However, once the researcher attempts to use more sophisticated statistical 
procedures, the 'standard formulae 1 are found to apply only to simple random 
sampling. In the past, researchers have applied these erroneous 'standard 
formulae 1 and (hopeful ly) have handled the results with suspicion. Previous 
research (Peaker, 1975 arid Ross, 1976) has shown that this suspicion is 
well-founded. The search for a solution to this problem has thrown up 
several approximate and intuitive techniques for estimating sampling errors 
given : iust one sample as evidence (Kish and Frarikel, 1974). It is the 
purpose of this study to investigate the accuracy of one such approximation 
technique (Woodruff and Causey, 1976) under several of the types of sampling 
schemes that a typical educational research worker might be forced to employ. 

Of course, the accuracy of the results is not the only criterion for 
evaluating such a technique. I-ase of application is of great practical 
importance, as is flexibility in the face of the diverse statistical and 
sampling situations which arise in educational research. The particular 
technique to be studied was chosen because it was found to be the only 
technique available which struck a worthwhile balance between the demands 
it places on the skills of the research worker and the range of possible 
applications in which it would be suitable. To apply the technique the 
researcher must be able to write a few Fortran subroutines and must be able 
to ascertain a sampling error formula for a mean for whatever sampling 
situation is to be used (i.e. look up one of the standard texts). In return 
the technique will provide an estimate of the sampling error for any 
statistic which can be expressed in terms of a Fortran subroutine. 
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The two iloiiiaiiils on the researcher are also investigated in this study. 
A guide to the use of the technique and the writing of the Fortran sub- 
routines is provided as an Appendix in Microfiche to this Paper. Several 
approximation formulae for the sampling error of a mecai, which might apply 
over a very Wide range of sampling situations, are coupled with tlie 
technique and their performances evaluated. The establishment of an 
adequate approximation formula would considerably decrease the difficulty 
in applying the technique and open the way for its incorporation into 
' user-oriented 1 packages . 
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LITiiRATURl: Rl:V1 MW 



2 . I 1 nt rod uet i i on 

The substance of most Sampling Theory textbooks (for example, Cochran 
( li>6S) , arid Kish [ 1965) ) , i s the estimation of descriptive statistics and . 
their standard errors for complex sample designs. Descriptive statistics are 
aggregates and means, and their ratios arid products. However, many prac- 
titioners are also interested in estimating analytical statistics such as 
regression coefficients, discriminant functions and correlation coeff- 
icients, for the complex samples they use. Theory is lacking for the 
estimation of the standard error of analytical statistics for complex 
samples: researchers have been forced to resort to the formulae supplied 
by the textbooks for simple random sampling. 

In order to alleviate this unfortunate situation attempts have been 
made to construct an appropriate theory with which to tackle the problem, 
but progress has been slow. Another solution was proposed by Tukey (1954): 

Statistical methods should be tailored to the real needs of the 
user 'What, should be done' is almost alwaysmore important 

than 'what can be done exactly'^ Hence new developments in 
experimental statistics are more likely to come in the form of 
approximate methods than in the form of exact ones. 

Several techniques for approximating standard errors from single 
samples have been described: I shall refer to them collectively as 
'single-sample techniques'. 

There are the replicated sampling techniques of Jackknifing and 
Balanced Repeated Replication (aiso known as Pseudo -replication) , the 
random splitting technique (also known as Independent Replication and 
Demi ng 1 s Technique) and the Taylor's series approximation (variously known 
as the linearization method, the delta-technique, the propagation of error, 
and Taylorized deviations) . A brief description of the first two and a 
more detailed analysis of the last follows. 

2 . 2 Replicat ed Sam pling Tec hniques 

Replicated Sampling Techniques were first used by Mahal anobis (1944, 1946) 
in surveys of jute in India in 1936. Deming (1956, 1960) advocated 
designing samples which are easily broken-down into subsamples. 




Two tuchii t tjttcs which have Rained prominence are Balanced Repeated 
Repl ica t ion and the Jaakkriij'e . 

Suppose that a statistic y is being used to estimate a parameter 0 
according to some sampling plan. The first technique, Balanced Repeated 
Replication, is used where the sample is divided into strata with two 
units selected from each stratum. The replication is a half-sample created 
by selecting one of the two sample units in each of the strata. The 
replication process is repeated g times. Then the estimates y! which are 
formed by estimating the parameter from the complementary half samples of 
the replication may be used to approximate the variance of y thus: 

Var(v) sly; (y! - y) * 

McCarthy (I960) has shown that the most efficient strategy is to 
select orthogonal replications only. 

For the second technique, the Jackkriife, which was originally due to 
Qucnouillc C 1956) and Tukey (1958), the sample is divided into g groups of 

size m. Then ti-.e values y , the estimates based on the m(g-l) observation 

_ . tl 

remaining after deleting the k 1 group of m observations, are used to 

ascertain the 'pseudovalues T y* thus: 

yj* = gy - (s-i) y k 

These can then be used to form a jackknifc estimate of 0 

and to estimate the variance of y 

VarOO =v.t(yy . £ (y k * -? b ) 2 

Investigations by Miller (l^^^J have suggested that these estimates will 
be satisfactory when y can be expanded in a power series for each observ- 
ation with 

the first-order term linear or regular in the observations; 
(ii) second and higher-order terms negligible. 
Similar, though less restrictive assumptions, will be made later for the 
Taylor ' s series approximation . 
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2 . 3 Random Sjp 1 i 1 1 ing- of Samples 

The random subsamplu technique was dove loped by Deming (1960) following 
suggestions from J.W: Tut.ey; lie estimated the variance of a statistic y 
by splitting the sample into 10 equal, independent and random subsamples, 
estimating the statistic for each subs ample Cy^J arid fov the entire sample 
(y) arid then approximating the variance of the statistic by the variance 
of the mean (y) of the subsample statistics. 

in . ..j 

Var[y) = 1=1 

10(10 - 1) 

Ten was the number recommended by Tukey but the approximation holds, to a 
greater or lesser extent , no matter how many subsamples are taken: 

Although superficially simple, this technique has several disadvantages 
for educational research. First, the estimation of complicated statistics 
may be neither stable, meaningful nor unbiased if only a small number of 
subsamples is taken (Finifter (1972), Mosteller and Tukey (1968)). Use of 
so many subsamples all modelled on the possibly clustered and stratified 
original sample would negate the computational simplicity of the original 
idea. Second, strata with small numbers of elements may need to be 
combined to allow the total sample to be divided into a large number of 
samples, resulting in a loss of detail. Third, if a large number of sub- 
samples is Used, outliers in the original sample will have little chance of 
appearing in some of the subsamples (Doming. 1956) . 

These difficulties have meant that researchers have concentrated on 
the other two techniques. 

2 . 4 The Tay l or '-s^^e^ ies Approximation 

The use of a Taylor's series approximation to obtain art estimate of the 
variance of a mean has been familiar to statisticians for some time. Its 
use for 'analytical statistics 1 was described by Deming (1960:390-396) and 
Kish (1965:585); and an early authoritative statement on its use was made 
by Kendall and Stuart (1963:231). 

Let g be a function of the sample variates x^ , x^ ... x^, which are 
assumed to take the expected values Bj , 6^ ... 9^. If g is dif ferentiable 
at the point (0^ , 6~ :.. 9^) , then the Taylor's series expansion of g 
about (6. , 6_ ... 9. ) is 




k 

R(Xj. x, XjJ = g(Oj, 0^ ... O k ) + Z || (x - 6.) CD 

i = 1 ' ' i 

j = l 1 = 1 1 3 



-1 T T T — ■ fx - e ] 

m- 1 j-li-i m j i 



x Uj - 0.) (x. - 0.) 



+ [Kendall £j Stuart, 1963 : 231-232) 

where the partial derivatives are calculated at the appropriate expected 
values. The first-order approximation to g is 

k j- 

i-R(EXi i x 2 ... x.j = g(0 0, ... o k ) + £ . ^f- (x- - 0.) (2) 

The first assumption made in the use of the Taylor's series approximation 
is that the sampling distribution of g is approximately equal to the 
sampling distribution of this linearized version of g. Thus 

Var (g) - Var (Lg) 

k 3 - 

= Var [g(0 0 o ... 6 R ) + £ (x: - ©.)) 

i = l ' i 

= Var (£ ff ; xp 

1=1 .1 



since gfO^ 0 ? ... 0^) and £ -||-_ 0, are both constants (Frankel, 
1071:28). 1=1 



Actually using such an estimator depends of course upon obtaining 
values for the partial derivatives. The second assumption involved in 
the use of the Taylor's series approximation is that values of these 
part i a 1 derivat i ves obtained from the sample are reasonable approximations 
of their true values. Tepping (1968) made Use of such a technique when 
he estimated the sampling variance of a regression coefficient over a 
multi-stage sampling design. Formulae for these partial derivatives 
are available for some of the more common statistics such as ratio means, 
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correlation coefficients; and regression coefficients (Frankel , 1971:30-31). 

However, beyond this the ground is as yet unexplored; Furthermore, although 

Topping found a means of using equation (?) in the particular sampling 

situation he was investigating, he also noted that: 

... the manner ,n which. the variance of that linear approximation 
may be estimated will of course depend on the sample design. 
(Tukey, 1954:723) 

Unfortunately the procedure for doing so is far from routine: 

it was to this latter problem that Woodruff (1971) turned his attention. 
By restricting the variates to those which are sums of th? observations (or 
sums of transformations of the observations) , equation (3) may be re- 
expressed thus: 

Vnrfg;) - Varf £ ff_ £ x.^ (4) 



When it is assMined that the observed valaes have been enumerated from 1 to 
n for each variate x^ . As the two summations are finite, their order may 
be reversed to give 

/ n k „ \ 
/ r. r <)g ^ 1 

(5) 



Var(g) = Var 




By defining a 'U - statistic 1 for each case by 
k 

i. : r M v-. - 
j " ~ 3x. X ij j = 1,2 ... n (6) 



the equation becomes 



.11 



Var(g) = Var £ U 



j = l 



j (7) 



Now the?e 0-statistics are simply univariate statistics which are linearly 
related to the original variates x, ... x^ . The formula for the evaluation 
of the variance in equation (7) is the one which would be appropriate for 
the estimation of the variance of a variable under the particular sampling 
design being used. This information is available in the standard texts 
for a wide range of sample designs (see, for example, Cochran (1963) arid 
Hansen, Hurwitz and Madow (1953)). It should be noted that these standard 
texts will often quote a formula for the sampling error of the mean of a 
variable which will have to be adjusted to give the variance of the variable 
which is needed here. This procedure will be referred to as the Woodruff 
algorithm, or the Woodruff technique. 
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A Wo r R cjJ li xarople . The description of the algorithm used to estimate 
Sampling errors will be made clearer and more concrete by the following 
example : 

Consider the simple linear regression of the variable x on the 
variable y, with n observations, 

y . = a + bx. + e. i = 1 . . . n (8) 

' 1 i i 

With the usual assumptions the best estimator of the regression slope b is 
ri 

£ (x - xjy" 
i = l 1 1 



n 



. £ (*< - *r w 

L = l 



where x = — .T] x: 



I f we define, 

s L = fx; - xjy. i = 1 . . . n (10) 

-,2 . 
t. = (x.-x) i = 1 . . : n 

i v l 

as the variates to be used in the algorithm, then 
n 

21 si 
i = l 

R (11) 

£ ti 

i=l 



b = 



and if 

s = £ st (12) 
i = l 

n 

i = l 

then 

b = t (13) 

Now the derivatives of the estimator with respect to each of the 
totals s and t may be found, 
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D_b _ i 
3s r " t 

s£ - -s (14) 

Dt - - 

Kecoarse to equation (7) then gives the Taylor Approximation of the 
Variance of b as , 

Var fb) = Var S i * f£ .fj *i 1 (15) 

H- /S . St . 

1 



i-llt ^ J (16) 



n 



= Var £ U i (17) 
i = l 

S . St . 

where U. = - - — ? ( 18 ) 
t t 2 

The variance involved in equation (17) is the variance appropriate 
for a total according to the particular sampling technique used. 

The restriction to functions of statistics which are totals of the 
observations is not so great as it may appear at first glance. For instance, 
a statistic as complicated as a multiple correlation coefficient may be 
expressed as a function of the sums and sums of squares and sums of cross- 
products of the variables involved in the regression equation. In this case 
the original list of variates heed only contain all of these in order that 
the Woodruff algorithm be implemented. 

In a paper, written by Woodruff and Causey (1976), a computer program 
is described which implemented this algorithm and solved the problem of 
evaluating the partial derivatives by the use of a numerical technique which 
avoids the necessity of supplying a formula. It does however, involve the 
writing of at least one Fortran subroutine. 

They checked the accuracy of this further approximation in three ways. 
First they compared the true partial derivatives with the numerical approx- 
imations, and found that the greatest relative difference was less than two 
parts in a million over a range of partial 'orivatives involved in the 
calculation of 48 different estimates in a six stratum sampling design. 
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Sccninl. they compared the variance estimates For these 40 statistics given 
by t lie Taylor's series approximation using analytic derivatives; the relative 
differences were ail less than one part in a million. Third, for those 
statistics for which no analytic derivatives were available, they compared 
the Taylor's series approximation using numerical derivatives with the 
Balanced Repeated Replication and .Jaekknife techniques over a very wide 
range of sampling designs; the results were found to be similar to those 
that Prankel (1971) achieved in comparisons using analytical derivatives. 

~ . S harlier Evaluations of the Taylor's Series Approximation 

Several studies investigating the Taylor's series approximation for the 
estimation of standard errors were conducted without the Woodruff -Causey 
modifications. The three most important were those of Trankel (1971), 
Mel lor (1973) and Bean (1975): 

Prankei used data collected by the US Bureau of the Census in the 1967 
Current ('ovulation Survey to simulate clustered stratified sampling on the 
l)as is of two primary sampling units per stratum. Comparison of the 
Replication, Jackknife and Taylor's series techniques was made for several 
sampling designs and for estimates of the mean, the difference of means, 
simple correlation coefficients, regression coefficients and multiple 
correlation coefficients. His conclusion was that although all three 
techniques gave satisfactory estimates of variance, (except possibly for 
the multiple correlation coefficient) the Taylor technique resulted in 
smaller mean square error whilst Balanced Repented Replication gave a 
better approximation to Student's 't' statistic. Mellor's design was 
similar to this but used Monte Carlo simulation rather than existing 
population data and extended his comparison to partial correlation coeff- 
icients. His conclus'nns were essentially the same as those of Prankcl, 
although he did note the comparative strength of Taylor's series approx- 
imation for error analysis of order statistics and highly skewed distrib- 
utions. Bean, working at the National Center for Health Statistics, 
dismissed this use of synthetic populations as being 'of questionable 
representativeness'. She concluded from her study that both the Balanced 
Repeated Replication and the Taylor's series approximation gave adequate 
precision on the two criteria employed by Prankel , (Bean, 1975:10-14). 
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Accompanying the Woodruff -Causey paper was an empirical study using 
the same data as Pranke t . The results of this study reinforced the 
conclusions of Frankel, although it was noted that the results using the 
Taylor's series approximation I ecame substantially better with increased 
sample size. However Woodruff and eaasey noted two other advantages of 
th i s techn i que . 

1 The Taylor method is probably more economical for_ computer 
time, particularly in situations involving large numbers of 
strata (and/or simple draws J . With the Taylor approximation, 
the basic data need be passed through the computer only twice, 
once to evaluate the partial derivatives and then again to 
form the substitute variables. The variances can then. be 
computed with a single pass of these substitute variables. 
With the other two methods, the. basic data.must be tabulated 

a large number of times to obtain the results for a large 
number of partial samples. The 43,200 variances using the 
Taylor-N method for the 6, 12 arid 30 strata designs required 
38 minutes of UN I VAC 1108 central computer time (6 cents per 
variance at Census Bureau rates for this machine). The 
21,600 variances for the 90, 270 and 810 strata designs 
required 85 minutes of UNIVAC 1108 central computer time (2,3 
cents per variance). This includes the cost of the derivative 
evaluation as well as the actual variance computation. 

2 The Taylor approximation is more versatile than the balanced 
repl ication method , and can easily be appl ied to any design 
for which there is a reasonable approximation to the variance 
of a single vari able . The balanced replication method is mast 
easily applied in sample designs involving a small number of 
strata and two draws per stratum. It can become difficult in 
other situations to find a balanced set of reasonable size . 
(Woodruff and Causey, 1976:521) 



A recent survey by Shah (1978) recommended the Taylor's series approx- 
imation over the other three. He summarized the situation with Table 2.5.1 
which is from his article. Me also noted that whereas for the Taylor's 
series approximation the total cost of computing variances is about twice 
that of computing the mean only, the other techniques require between 50 
arid 100 times the cost of computing the mean. Furthermore he points out 
that if interpretation of the data requires the computation of variance 
components, the Taylor's series approximation is the only technique 
appropriate . 

2 . 6 Some Theb?etical an d Practical Advances 



Krewski and Rao (1978) have investigated the theoretical basis for the 
Taylor, Jackknife and Balanced Repeated Replication methods of sampling 
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Table 2;S;l Compari son- of Single Sample Techniques (from Shah 1978:32) 




Assumptions 



Restrictions 
on sample Computational 

design problems Flexibility 



independent 
repl ications 

Pseudo- . 
repl ication 



Taylorized 
deviations 



.Jackkn.i fe 



Mi nimal 



Severe 



Independence of 2 PSUs per 
complementary stratum 
hal f replicates. 

General central None 
limit theorem 



Intuition 



None 



Simple 



Significant 



Not _ 

difficult 



Can be used 
for variance 
components 



Greater than Maybe useful 
Taylorized for some 
deviation designs 



error estimation. They have established that as the number of strata 
approaches infinity, all three estimators are asymptotically normal and 
consistent. Although not very useful from a practical point of view, this 
result is nonetheless quite comforting. In a later paper they have also 
investigated the small sample properties of the three types of estimator; 
the results reported there are of interest bat have a very restricted range 
of applicability due to the very strong model-assumptions necessary in such 
an investigation ^(rewski arid Rao, 1979) . 

Bobko and Reick (1980) have made an interesting application of the 
Taylor's series approximation to functions of correlation coefficients. 
As in equation (4) above they make the approximation of the function g of 
the correlation coefficients r ^ , r^ ,,, r^ thus, 



Var (fiOj , r 7 



£ [g- (P)] var (r. ) 



i = l 



k k 

+ £ Yj gjfpDgi(p) cov r ) 

j = l i-=l 1 3 1 J 



where p io the expected value of the correlation coefficients, i.e. 
p = (IUrj), Efr 2 ) 



Efr k » 
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Then using a normality assumption and some further restrictions they give 

formulae for vnrjrO and cov(r. , r.-J. Formulae for the derivatives are 
i .1 j 

given for some simple statistics such as the correction for attenuation 
and indirect effects in path analysis. The resultant standard errors are 
then evaluated using data derived from synthetic populations. The emphasis 
bri normal distributions points up the restriction in Usefulness of this 
particular approach. In situations when the assumption of normal distrib- 
utions was not tenable (which is often the reason for trying a Taylor 
Approximation) the expressions for the variance and covariance would not 
be applicable. The strength of this approach may lie not in the value of 
the actual standard errors obtained in any particular situation, but 
rather in the value of obtaining fut.Jtibhal forms for the standard errors 
in terms of the correlation coefficients. The existence of such forms, 
even though based on quite restrictive assumptions, allows the investigation 
o*f sampling errors on a different level to that which has previously been 
possible : 

Since the publication of the Woodruff-Causey paper, several general 
programs using Taylor series approximations have become available. There 
is, of course, the original Woodruff -Causey program. Next was Shah's 
STDHRR (Standard Errors Program for Sample Survey Data) which computes 
certain ratio estimates or totals and their standard errors from the data 
collected in a complex multistage sample survey and is available within 
the SAS package (Shah, 1974). HidirbgloU, Fuller and Hickman (1975) 
published SUPER CARP (Cluster Analysis and Regressions Programme) which 
estimates totals, ratios, differences of ratios and regression coeff- 
icients and their associated Variances for several multistage Complex 
designs and for a one- fold nested error structure. M.M. Holt (19 77) has 
produced SURRl-GR (Standard Errors of Regression Coefficients) for the 
testing of hypotheses concerning regression models using a stratified 
multistage sampling design and ordinary least squares or weighted least 
squares. The World Fertility Survey has produced a program called CLUSTERS 
which uses the 'collapsed strata' technique mentioned earlier to produce 
error estimates for ratio estimators (Verma and Pearce , 1978). The Office 
of Research and Statistics within the U.S. Social Security Administration 
is developing a software package designed to accommodate many different 
sampling designs but it is as yet able to offer the Taylor Approximation 
only in the Keyfitz form (see equation (5)) (Finch, 1978). A survey of 



15 




trie iiiany computer programs available - 9 summarising a few important features 
for each, has also appeared (Kaplan, l-'raricis and Sedrahsk, 1979). One method 
of evaluating these packages has been pursued by several researchers 
(Woodruff and Causey 1976 and Maurer, Jones and Bryant, 19783 . This 
involves the comparison of the programs with respect to their computational 
efficiency, evaluated in terms of central processing time, for a represent- 
ative sample of designs. th*s comparison may loom large in the eyes of 
computer programmers, but for a research worker, the issues of ease of 
application and adaptability to different situations will prove much more 
important . 

Although much valuable work has been done at many research centres, 
they have invariably been concerned with the solution of the sampling 
error problem in terms of the particular style of sample design dominant 
at each centre and in terms of the particular range of statistics that are 
studied there. The incorporation of sampling error routines into such 
packages as SAS and OS IK IS has begun and will eventually make the calcu- 
lation of sampling errors a routine procedure within the limitations of 
the application of those packages. It would seem however that beyond 
this the researcher will be forced either to write entire programs for 
whichever single-sample technique is chosen, or to write the type of semi- 
standard subroutines which are necessary to the application of the Woodruff- 
Causey program. 

2 . 7 Use of Variance Estimation in Educational Research 

Attention to the problem of variance estimation by educational and psycho- 
logical researchers was urged by Marks (1947) in connection with a revision 
of the Stanford- Bi net Scale. 

Ignoring the effects of cluster sampling on measures of sampling 
error has undoubtedly. resulted in attaching importance. to results 
which are statistically insignificant. (Marks, 1947:413) 

He found that the standard errors as calculated by the simple random 
sampling formulae were underestimating the true standard errors by a factor 
of three. The first investigation of sampling errors for a large-scale 
educational survey was made by Peaker (1953). Standard errors were found 
to be underestimated by half in this case: 

The whole topic was consolidated with the work of Kish who introduced 
the statistic 'Deff (design effect) which is 
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Uic ratio of the actual variance of a sample to the variance 
of. a. i iji[>) c random sample of the same number of elements. 
(Kish, li»(iS:2r>8j 



A useful modification of this is the 'design factor 1 , abbreviated as 
'deft 1 ; and equal to the square root of the design effect. (Verma et al , 
11)80) 

Kisli used balanced repeated replication to estimate Deff values from 
a sample of 2,200 tenth grade boys in American public schools (Bachman 
et al, 1967) . Deff for sample means was found to be less than three and 
for correlation coefficients and ratios it was found to be about 2.3. 

A modification of Demi rig's technique using the range of estimates 
provided by four independent samples was used by Peaker (1967) in an 
international study of mathematics achievement. He found Deff values of 
correlation coefficients ranging from 1.96 (in Japan where clusters of 10 
students per school were selected) to 8.4 (in Scotland where 75 students 
per school were selected) . 

keeves (1966) decomposed total variance due to classroom and variance 
due to students in what appears to be the first application of these tech- 
niques to Australian educational data. He also Calculated Deff values of 
from 1.00 to 21.3 using a similar method to that of Peaker (1967). 

Jackknife procedures were used by Peaker (1975) in an international 
study of achievement (Comber and Keeves, 1973) in which he found average 
Deff values of 6 for means, 2.5 for correlations and 2 for regression 
coefficients; the primary sampling unit used was the school. 

Ross (1976) used an empirical approach to estimate Deff for several 
typical sample designs and statistics in common use. He found that the 
lowest values of Deff occurred for designs that used schools as the primary 
sampling unit and also for the more complex multivariate statistics. A 
comparison of these results with Balanced Repeated Replication and Jackknife 
estimates revealed that both techniques were performing reasonably well on 
the average. However he points out that individually estimates vary quite 
considerably from the empirically-derived results. 



15 




CHAPTER 3 



DESIGN 01- Tills STUDY 

? m \ Introduction 

The chapter which follows describes the procedures used to examine the 
Taylor Approximation. A previous stud/ is described in detail, as the 
data-base and subsequent empirical analyses provide a bench-mark against 
which tHe technique can be compared. The comparison is in two parts. 
Firstly, the Taylor Approximation is compared to the empirically-established 
'true' estimates of variance. Secondly, it is compared with two other 
single-sample techniques which were investigated in the previous study. 

3 . 2 A- £-r-e v4 ouS- Study 

The present study capitalizes on data collected by Keeves (1971) and later 
analysed by Ross (1976). 

The remainder of the section is devoted to a summary of this data -base, 
and the analyses to which it Was subjected by Ross: Further details may be 
found in Ross (1976) and Keeves (1971). 

The Data-base . The population under study consisted of 2354 Year 7 
students in the Australian Capital Territory in August 1969. This was 95 
per cent of all such students: data sets which so nearly encompass a 
genuine population are extremely rare in educational research. 

The students came from three school 'systems'. System 1 is a collection 
of nine government schools with fifty-three Year 7 classes. System 2 is a 
collection of four Catholic schools with fifteen Year 7 classes. System 3 
is a pair of independent schools with seven Year 7 classes. 

Keeves gathered data on a large range of variables for this population. 
Five were selected by Ross for inclusion in a causal model; they were chosen 
to represent a Wide range of types of variable, to provide a range of 
magnitudes of the intercorrelations between them, and to constitute a mean- 
ingful model of educational achievement. These variables are described in 
Table 3.2.1. 

The Causal Model . The causal model used by Ross is an example of the 
'Path Analysis' technique (Duncan, 1975), This technique and its application 
to a particular situation could be subjected to any number of criticisms.. 
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Tabic 3.2.1 The Variables in the Causal M6<lc4- 



Variable name Description 

SI-X Coded on a two point scale with male = i, female = 2: 

! ; 0(X"UP The occupation of the student's father coded on a six 

point occupational prestige scale (Broom et al , 1977). 
LIKHSCIIL A 17 item scale designed to measure student's attitude 

towards school. 

liXPHDN A seven point rating designed to measure the student's 

level of aspiration for further education. 

MATHS A test of 55 mathematics items each of which was 

scored: correct = 1, incorrect = 0. 



However, the model is used in this study merely as an example of the type 
of correlational analysis widely used in educational research. 

The model investigates the relative influences among the variables 
under the assumption of a certain ordering of causality: 

1 Antecedent student characteristics influence 

2 Attitudes toward school and these characteristics and attitudes 
influence 

3 Aspirations towards further education and these characteristics, 
attitudes, and aspirations influence 

4 Achievement in Mathematics. 

These influences are measured by what are termed 'path coefficients' 
which may be shown to be equal to standardized regression coefficients 
(Kerl ingcr and Pedluizur, 1973:310-14) . The first stage in this causal chain 
consists of variables for which it is assumed that causes outside the model 
Completely determine variability. At each subsequent stage it is assumed 
that causality is unidirectional; that is, no variable can be both cause 
and effect of another. A residual variable is included at each stage to 
account for all Other sources of variation (these are referred to by lower- 
case letters a, b, c, etc.) . ft is assumed that a residual variable is 
neither correlated with other residual variables nor with the variables 
in the model to Which it is not attached. 

The model is illustrated in Figure 3.2.1. In interpreting correlation 
coefficients and path coefficients associated with this figure it should be 
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Figure 3:2.1 The Causal Model 

noted that 'father's occupation' is not scored in the 'usual' direction, 
so that a high score on this variable assigns a low relative rating on 
the scale of occupational prestige. 

The Sample Designs . In order to establish the effects of different 
sampling strategies, Ross chose five sample designs and drew, from the 
population described above, twenty-five samples of 150 students according 
to each of the sample designs. Samples of size 150 were deemed appropriate 
firstly as this is large enough to achieve stable estimates of the analytic 
statistics used in correlational analyses involving a 'medium' number of 
variables, and secondly as an example of the research designs which Would 
be within the economic and administrative resources of the typical educat- 
ional research worker: Twenty-five replications were considered sufficient 
to establish reliable empirical data for the sampling distributions of the 
various statistics associated with the causal model. The five sampling 
designs are described below: 

Design 1: Simple random sample of 150 students (SRS design). 

Each sample is a simple random sample of 150 students from the entire 
population. 

Design 2: Stratified proportional simple random sample of 150 students 
CSTR design) . 
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Table 3. 2. 2 Contribution of Each Stratum to the STR Design 





Number. of 


Propo rtibn 


Number of 


P roportion 




students in 


of 


students 


of 


Stratum 


population 


population 


in sample 


sample 


1 


1611 


0.684 


103 


0.687 


2 




0.229 


34 


0.227 


3 


204 


0:087 


13 


0.086 


Total 


2354 


1 :000 


150 


1.000 



The strata chosen were the three school systems. Each stratum 
contributed to the sample in proportion to its size within the entire 
population; and within each stratum an independent simple random sample 
of students was chosen. The number of students from each stratum is 
shown in Table 3.2.2. 

Dei'ifjn 3: Probability proportional tc size selection of six primary 

sampling units (schools) followed by simple random selection 

of twenty-five students within each selected cluster (SCL design) 

The fifteen schools were each allotted probability of selection 
according to their size, then six were chosen, without replacement, 
according to these probabilities. Within each school chosen, twenty-five 
students were selected as a simple random sample. 

Desirjn 4: Probability proportional to size selection of six primary 

sampling units (classes) followed by simple random selection 

of twenty-five students within each selected cluster (CtS design) 

The sampling frame was first rearranged so that no class was smaller 
than twenty-five. Small classes were amalgamated to form 'pseudoclasses' 
and the same process was applied to these 1 pseudoclasses 1 and to the larger 
classes as was applied to the schools in the SCL design. 

Design 5: Stratified cluster sample of 150 elements with two primary 

sampling units (classes) being chosen from each stratum with 
probability proportional to size selection followed by simple 
random selection of 25 elements within each selected cluster 
(WTD design) . 
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Table 3.2.3 Weights Used i n - tho- WTP Design 



Number of Number of 

students in students in 

Stratum the stratum the sample Weight 



00 (N h ) (n h ) 



1 1611 50 2.053 

2 531) SO 0:687 

3 204 50 0.260 

Total N = 2354 n = 150 3.000 



The sampling frame was first rearranged as for the CLS design. The 
same techniques were then applied to the set of classes and ! pseudoclasses 1 
within each stratum as were applied to the SCL design, but only two 
selections were made. As this results in fifty students being selected 
from each stratum, the data for each student selected was weighted in 
proportion to the size of the stratum from which it was selected. 

If N is the population size 
n is the total sample size 

is the size of stratum h in the population 

is the size of stratum h in the population 
then the weight for stratum h is 

Nr 

K = & * — (Kish, 1965:429) 
h N n. J 
h 

Table 3.2.3 details the calculation of these weights for each stratum. 
(Note that for the other four sample designs each element of the population 
has the same chance of being selected, and hence, no weights were needed.) 

The Sampling Error formulae. The statistics chosen for study were: 
the Mean, the Correlation Coefficient, the Standardized Regression Coefficient 
and the Multiple Regression Coefficient. The sampling error formulae 
appropriate for each of these statistics under simple random sampling is given 
in Table 3.2.4. All except that for the Correlation Coefficient are standard 
results. For that statistic however, the more usual sampling error formula is 

5 r = (l-r 2 )//n 
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Tab I e 3 . 2 . A Sa mp 1 i ng Error Estimation Formulae. Used to Estimate the 

f)c no inTn a t o r__o f the EgiKit- ibn which Defines the Design Effect 



Sample statistic 



Estimation formula 



Mean (X) 



7= (Guilford and Fruchter, 
n 1973:127) 



Correlation coefficient ( r) 



J- (Guilford and Fruchter, 
/n 1973:145) a 



Standardized regression 
coefficient (b) 



°12 



^- R 2. 3 4...m^ n - m) 

(Guilford and Fruchter, 
1973:368) 



Multiple correlation 
coefficient (K) 



— - (Guilford, and Fruchter, 
/n^m 1973:367) b 



This formula was not used (a) because we wished to provide the reader with 
an example of how to use this technique iri the relatively simple problem of 
testing whether the Correlation Coefficient is zero (for this test one 
assumes that r vanishes in order to calculate the sampling error and so the 
above formula reduces to the one given in Table 3.2.4) and (b) because 
there is some debate over the utility of this formula when r is small 
(See McNemar, 1969:155) which is the case for several of the correlations 
under investigation. 

Results and Conclusions . Ross used the values of the square foot of 
the Design Effect, 'deft 1 to measure the sampling errors. The equation 
defining this statistic is 

deft = ^5 



where d is the estimate of the standard deviation for the statistic 
c 

ana complex sampling design under consideration 
and a srs is the estimate of the standard deviation for the same 
statistic Which would be obtained if simple random sampling 
formulae were used. 
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The estimate o; was, of course, one of the goals of Ross's study. 

the formulae Ross used for 3- -. were derived from one source" (Guilford 

srs 

and Fruchter, 1973), and are detailed in Table 3.2.4. The formulae apply 
to the case of a simple random sample of n elements on m variables where 
the variable X has a standard deviation of s. The multiple correlation 
coefficient U ^ ■ refers to the regression equation which has 

variable 1 as the criterion and variables 2, 3, ... m as the predictors. 

The vi' lues Of the square root of the Design Effect (deft) for each 
statistic averaged over the twenty-five replications and for each of the 
five sample designs were calculated by Ross. 

From this evidence Ross concluded that 

... the use of complex sample designs to gather data may 
greatly influence the sampling stability of the statistics 
required to describe a recursive causal model. (Ross, 1976:45) 

Ross also calculated the values of deft given by two of the single- 
sample techniques using one sample for each. Balanced Repeated Replication 
was used with the WTD design, and Jackknifing was used with the CIS design. 

From these two cases Ross concluded that both techniqvjs provided 
•useful estimates of average i/Bef f • . 

3 . 3 The Estimation of Sampling Errors from Single Samples Using a 
Taylor Approximation 

The Woodruff Algorithm was applied to each of the five sample designs in 
order to estimate sampling errors. The process was repeated twenty-five 
times for each design to obtain a reliable guide to the behaviour of the 
estimate. The procedure followed is described in the remainder of this 
section, the results obtained are discussed in Chapter 4. Details of the 
application of the computer program may be found in Wilson (1981). 

Variance Estimators for the Sample Designs . As mentioned previously, 
a Fortran subroutine providing an estimate of the variance of a total must 
be supplied to the program. The formula for each of the five sampling 
designs is given below. Let U be the statistic under consideration. 

SRS Design 

tet ^ be the observed value of the statistic for the i element 

f be the total sampling fraction 

n be the number of elements in the sample 

u be the mean of the u. 's 

i 
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Then the variance of the statistic U is estimated by 



Var (U) 



(i 



(Woodruff and Causey, 1976) 



iiTfl Design 



Let h=l,2 




u. 



'hi 



. . . II be the strata 

be the observed value of the statistic for the 
i** 1 element in the h** 1 stratum 
be the sampling fraction in the h th stratam 
be the number of elements sampled from the h th stratum 
be the mean of the u, . ' s for the h** 1 stratum 



be the proportion of the population in stratum h 



Then the 



variance of the statistic U is estimated by 



Var (U) 




(Woodruff and Causey, 1976) 



SCL and CLS Designs 

The most appropriate estimator for these two designs would be one which 
took into consideration the use of probability proportional to size 
selection and the use of selection without replacement at both stages of 
the two-stage design. Such an estimator is described by Sukhatme (1954: 
410). However, this estimator involves the use of the probabilities of 
selection of the primary sampling anits, and of the joint probabilities 
of selection of pairs of sampling units. This proved tractable though 
costly for the case of schools, but when the same computations were 
attempted for classes practical considerations involved in the use of 
busy computer installations meant that the job aould never be finished. 
This problem is mentioned by Sukhatme who suggests that 



As this is the procedure most research workers would follow in any case 
it was decided to heed Sukhatme' s advice. 



. . . the use of the estimate appropriate for sampling with 
replacement , introducing the usual finite multiplier for 
calculating the error variance, is probably sufficiently 
satisfactory. (Sukhatme, 1954 :4 J 5) 
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As thr two designs are exactly the same apart from the size of the 
clusters used, a unified account is given below. 

t h 

Let m; be the number of elements in the i cluster 
i 

be the number of elements in the wh^le sample 
n be the number of clusters sampled 
f be the overall sampling fraction 

(ij be the mean value of the statistic in the cluster 

u be the mean of the u.'s 

1 

Then the variance of the statistic U is estimated by 

2 

m " u _ 7 

V.,r (II) = (1 - H^jj Z - u) 2 

1 = 1 



CSukhatme, 1954:363) 



I ; or this design a version of the previous estimator could be applied within 
each stratum, the results weighted according to the relative sizes of the 
strata and then addod across strata to obtain an estimate for the population 
variance. However, when this was attempted, the results proved extremely 
unstable due to the presence of only two clusters per stratum: 
Two alternatives presented themselves, ignore the stratification and use 
the variance estimator for the CLS design, or ignore the clustering and 
use the variance estimator for the STR design. As the effects of cluster- 
ing had already been investigated for two different designs, it was 
decided to pursue the latter strategy. Thus the variance estimator used 

Was that described for the STR design, with the statistics u. , and u. 

hi n 

replaced by the appropriate weighted statistics u^ and . 



CHAPTER 4 



RESULTS: Till- PERFORMANCE QV THE WOOJRUEE 
TECHNIQUE EOR ESTIMATING SAMPLING ERRORS 



In this chapter are discussed the performances of the Woodruff technique, 
as applied in the Woodruff -Causey program, as regards the estimation of 
sampling errors for the five designs and using the five Variance 
estimators described in the previous chapter. The first section discusses 
the evaluation techniques used, the second examines the results for the 
SRS design, the third examines the results for the stratified designs and 
the fourth examines the results for the clustered designs. The fifth 
section compares these results with those obtained in a previous study, 
and the final section is a summary of these results. 

A . 1 Th e Evaluation Techniques 

In discussing the effects of sample design, three types Of evaluation 
procedures were used. The first measures the relationship between the 
estimates of sampling error obtained from the Woodruff -Causey program and 
the 'true' sampling errors which were derived empirically. The second type 
of evaluation relates to the internal consistency of the sampling error 
estimates which were obtained from the Woodruff-Causey program. The third 
type of evaluation investigates the extent to which the student ized ratios 
are distributed as a *t' - statistic around their mean, which bears upon 
their usefulness for hypothesis testing. 

Design Effect . In order to establish a criterion for choosing between 
sample designs, Kish introduced the Word 'DefP, derived from 'design 
effect 1 , to name 



Thus, if an estimator u, of a population parameter u, is used under a 
complex sampling design C, then a measure of its efficiency is 



the. ratio of the actual variance of a sample to the variance of 
a simple random sample of the same number of elements (Kish, 
1965:258) . 



Ueff (u,C) = 



Var (U ) 
c 



(1) 



Var (u 



srs 
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Tahlo A . 1 .1 



jr the Statistics Used in the Study 



Statistic 



Population Value 



Means; SEX 


1.4731 


FOCCUP 


3.1175 


LIKGSQIL 


21 .3732 


EX PEON 


4.2840 


MATHS 


29.5415 


Correlation Coefficients: 




SF 


-0.01123 


St 


0.14908 


SE 


-0.09723 


SM 


-0.07560 


Ft 


-0.13988 


m 


-0.41609 


I'M 


-0.37256 


til 


-0.39518 


LM 


0.21185 


KM 


0.51094 



Path Coefficients: 



SL 


0.14752 


SO 


-0.15609 


SM 


-0.04150 


FL 


-0.13822 


FF, 


-0.36648 


FM 


-0.19684 


LE 


0.36719 


LM 


0.02672 


CM 


0.41444 



Multiple Correlation Coefficients; 
bi KESCHt 
HXPHDN 
MATHS 



0.20329 
0.55926 
0.54211 
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whore fi- indicates that the estimator is applied with the complex sample 
design C, whereas indicates that a simple random sample of the same 

size was used. Note^ 'hat Deff is dependent upon both the sample design 
and the estimator U. Usually the relevant design and estimator are obvious 
and the arguments are left out. 

As the discussion of the effects o r sample design is usually couched 
in terms of sampling errors rather than sampling variance, a more 
appropriate criterion is the design factor or 'deft' which is defined by 

deft (u,C) = /neff C", c ) (Verma et al , 1980) (2) 

Itos r : (1976 J made all his comparisons using this measure. It has been 
pointed out that deft appears less sensitive to sampling errors than Deff 
(Kish, 1969:334) . 

In the interests of obtaining some stability in deft values, Kish 
and F ; rankel (1970:1092) recommend that particular values of deft be obtained 
for each instance of each type of statistic and that the average of these 
values should be reported as deft. Of course, such an averaging process 
must be confined to particular types of statistics due to differences in 
units of measurement, sample size^ and differences in the variances of the 
variates involved in calculating the estimator. 

The 'true' values of the various statistics were found using the SPSS 
collection of programs with 'list-wise' deletion. This means that the 
population parameters are slightly different to those quoted in Ross's 
study; this is not a problem as all the most important Comparisons to be 
made were based on fresh samples. These values are given in Table 4.1.1 
and will, for the purpose of this investigation, be considered true 
population parameters. The multiple correlation coefficients in this table 
are named by the criterion variable for the appropriate regression equation. 

Calculation of design effects depends upon finding a good estimate of 
the standard error which would obtain under simple random sampling with the 
same number of sample cases as was used in the complex sample. The formulae 
used to calculate these simple random sampling standard errors were the 
same as those used by Ross (1976:29-30) which were detailed in Table 2.2.4. 

In using these formulae, an estimate of the population standard 
deviation for each variable, arid of the relevant Multiple Correlation 
Coef f ici ents was found using the entire population. The formula standard 
error was then found using the appropriate number of sample cases. 
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This provides zi 'best estimate' of the standard deviation that Would be 
obtained from a simple random sample; In practice, a researcher would 
almost always have to use the sample obtained by the complex sampling 

process to estimate o . Such estimates would vary greatly depending 

1 . srs _ 

on the particular sampling scheme in Use; Use of the 'best estimate' 
provides a stable standard against which to compare both the empirical 
and the estimated standard errors of the complex designs; 

These concepts were implemented according to the following formulae. 

If f. is the estimate of the function f resulting from the i th sample, 
then the average, f, is given by 

25 

f = 1=1 ' 

and the empirical estimate of the standard deviation, * p is given by 

- 2Sf 2 1 (4) 




Furthermore, if f is the 'true* value of the function (i.e. that derived 
from population data) and s f is the simple random sample standard deviation 
derived from the formulae in Table 3.2.4. then the bias of f is given by 

bias (f) = f - f (53 
the Mean Square Error of f is given by 

M.S.!;. (f) = [bias (f)] 2 + (s f ) 2 (6) 

and an empirical estimate of deft is given by 

deft (f .,C) = If (7) 
J S f 

where C denotes the complex sampling design under consideration . In 
addition, the 'deft error 1 was also calculated; by this is /meant, the 
percentage error incurred b> assuming that deft equals one; that is, 

deft error = 1 !j e ^ ff x 100 (8) 

Thus, a deft error of -26.433 indicates that if one used the simple random 
sampling version of the sampling error, one would be using an error estimate 
which was 26.43°6 below the correct figure. 
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Recourse to a t :ib 1 e of the probability distribution of Student f s 't 1 
statistic on the appropriate number of degrees of freedom. (for example, 
Pearson and Wishart, 1947:118-119) will then provide an interpretation of 
this error in terms of true and apparent confidence intervals. 

These values were then compared with the Woodruff -Causey estimated 
standard errors in the following way. 

If Sj.. is the i** 1 estimate of the standard error of function f, then 

the average standard error is simply 

_25 

:; fi (9) 

s f , - i=l 
1 25 

th . .- 

the i estimate of deft is given by 
s_ . 

deft (f. ,n = — CIO) 
i s f 

and the average deft is 

£ deft (f C) 

deft (f,C) = 1 = 1 _ L J 

25 

A percentage error involving this formula was also calculated using the 
forma la 

i n*. dc ft - deft J 

deft error = 3— ~z x 100 (12) 

deft J 

where deft refers to the empirical value and deft refers to the average 
estimated deft for the function f. 

Relative Mean Square i-rror . The internal consistency of the Woodruff 
CaUsOy estimates of standard error was investigated Using the following 
statistics , 

If s_. is the i** 1 estimate of the standard error, and s r is the 

f 1 _ _ . '_ . f 

average over the 25 samples, the standard deviation of the standard errors 
is given by 



st. dev. (s f ) ( Sfi ) 2 - 2S(s f )M (13) 



i = l 



the bias is given by 



bias (sp = Sj. - Sj. (14) 
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and the Mi-an Scjuii re lirror is given by 



M.S.E. Cs f j = [bias (s f )] 2 + 1st. dev. C^j] 2 [15) 

As those statistics gain meaningfulness only by comparison with the 
variability of the original function, f, and in order to allow comparison 
across function which have different magnitudes, the Relative Mean Square 
Frror was ill so calculated: 

RELMSI: (s.) = - J~ n ^ 

t cs { o u J 



This can be broken down into tro cernis ; Relative Bias, and Relative 
Variance given by 



RF.LBIAS (s") 



[bias (s f )]* 



f " (if) 2 CI 7 ) 
- - 2 

.. . [st. dev. (s )] flg . 

KKl.VAR (s_) = = 

Of course, RHLMSE = REfcVAR + RB tB I AS . 

These statistics Were those used by Frankel (1971:61-77), except that 
lie investigated the variance rather than the standard deviation. In 
concordance with the use of 'deft 1 rather than 'Deff it was decided that 
measures of the internal consistency of the standard error were more 
appropriate in this investigation. 

Student's t . The third type of evaluation also follows the lead given 
by Frankel (1971) . There he examined the assumption: 

The distribution of the ratio Of the first-order estimate minus, 
its expected value, to its estimated standard error is reasonably 
approximated by Students' t within symmetric intervals. 
(Frankel, 1971 :78) . 

This assumption is crucial to the interpretation of the sampling errors 
derived from the Woodruf f -Causey program. If the assumption is tenable, 
then credible inferences using the t -distribution can be made from the 
samples; if the assumption is not tenable, then the standard errors could 
still be utilized in a Tchebytchev - type inequality, but such results 
would we extremely conservative. 
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Tl'fole -1.1:2 Proportion of St udents 1 t Wit hi n Selected Intervals 



Degrees of Freedom +2.576 £1.960 il.645 

















3 




0.9196 


0 


.8549 


0 


.8124 


4 




0; 93S4 


0 


.8782 


0. 


.8244 


5 




0.9503 


0. 


.8925 


0. 


.8502 




[Standard 
normal cascj 


0.9900 


0 


.9500 


0 


.9000 



Note: These proportions were (where necessaryj _ calculated by linear 
interpolation from a table of the probability integral of 
Students 1 t in Pearson and Wishart f 1947 : 118-1 19 j . 



The investigation consisted of finding the proportion of tim>s the 

ratio 

f. - f 

_i _ (19) 

s ri 

full within the intervals (-2.576, 2.E76) , (-1.960, 1:960) and (-1.645, 
1.645). These proportions were then compared to those predicted by 
'Student's t' oh an appropriate number of degrees of freedom. 

Table 4.1.2 gives the Student's t proportions that were used for 
compari son . 

For the non-stratified designs using a simple random sample variance 
estimator, the appropriate number of degrees of freedom is the number of 
sample cases minus one. Stratified designs usually take the number of 
cases minus the number of strata, but the presence of unequal stratum 
sizes and of weighting make this only an approximation. Frankel, invest- 
igating a series of designs involving many strata, but only two cases per 
stratum, hypothesized that the number of degrees of freedom was equal to 
the number of strata; this point is discussed in Section 4.4 (Frankel, 
197 1: 79 j : 

For the Jackknife variance estimator the appropriate number of degrees 
of freedom is one less than the number of distinct pseudovalues (Mosteller 
and Tukey, 1977:3oj: For all the .Jackknife examples used in this study, 
there were six different pseudovalues, so the number of degrees of freedom 
was five. For the Balanced Repeated Replication variance estimator, the 
number of degrees of freedom was four. For the variance estimator used in 
the SCL and CLS designs the number of degrees of freedom is the number of 
clusters minus one; in this case, five. 




Table 4 . 2 . 1 Average De ft Estimates for each Statistic 
Average 'M' was MS. HO 



i-unct ion 



I-rnpir i cal Estimated 



Percent error 
of estimator 



Percent error 
of formula 



Means: SP.X 


i 


.0057 


0 


.9691 


-3 


.6 


-0.6 


FOCCUP 


0 


.9423 


0 


.9593 


1 


.8 


6.1 


LIKLSCIIL 


l 


. 1018 


0 


.9610 


-12 


.8 


-9.2 


r.xpnbN 


i 


.0507 


0 


.9842 


-6 


.3 


-4.8 


maths 


0 


.8741 


0 


.9629 


I 0 


. z 


1 A - A 


Correlation Coefficients: 












SF 


0 


.9787 


0 


.9667 


-1 


.2 


2.2 






7 ion 


0 


.9325 


27 


.4 


36.6 


Si: 


Q 


.8421 


0 


.9610 


14 


. 1 


18.7 


SM 


1 


.1150 


0 


.9486 


-14 


.9 


-10.3 


l ; L 


1 , 


.0966 


0 


.9003 


-17 


.9 


-8.8 


VI: 


e 


.7693 


0 


.8174 


6 


.3 


30.0 


i-M 


0 


.8022 


0 


.8170 


1 


.8 


24.7 


LH 


i, 


.0302 


0 


.8266 


-19 


.8 


-2.9 


LM 


0 


.9649 


0 


.8917 


-7 


.6 


3 .6 


EM 


0, 


.7169 


0.6826 


-4 


» 8 


39 .5 


Path Coefficient:;: 
















St. 


0. 


7524 


0. 


9295 


23, 


,5 


32 . 9 


sn 


0. 


8634 


0. 


,9368 


8. 


5 


15.8 


SM 


1. 


1406 


0. 


9068 


-20. 


5 


-12.3 


FL 


1 . 


1320 


0. 


9008 


-20. 


4 


-11.7 


Ffi 


1 . 


1493 


0. 


9367 


-18. 


5 


-13.0 


FM 


0. 


9582 


0. 


9530 


-0. 


5 


4.4 


LI: 


1. 


1172 


0. 


9695 


-13. 


2 


-10.5 


LM 


1. 


0471 


0. 


9413 


-10. 


1 


-4.5 


EM 


0. 


9078 


0. 


8836 


-2. 


7 


10.2 


Multiple Correlation 


Coefficients : 










LIRESCHL 


0. 


8662 


0. 


8852 


2. 


2 


15.5 


EXPEDN 


0. 


6620 


0. 


6371 


-3. 


8 


31.1 


mat! is 


0. 


6756 


0. 


6146 


-7. 


5 


48.0 



Note: Values recorded in columns 1, 2 and 4^could be improved by making 
corrections for cases where r / 0 or R / 0. 
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4:2 Results for the SKS D esign 

The application of" i\\'v Woodruff technique to this design may seem super- 
fluous, after all, estimators for sampling errors for this design are well 
established. The investigation is important however, firstly because it 
provides a bench-mark against which to compare the results for all the other 
sample designs, and secondly because the •formula 1 sampling errors quoted 
in Table 3.2.4 arc all dependent upon some sort of normal -distribution 
assumption; This may hot be appropriate. In addition, it should be noted 
that the formulae in Table 3.2.4 are appropriate for sampling with replace- 
ment from infinite normal populations. The methods used in this investigation 
relate to sampling from finite populations without replacement; 

The average of deft for each of the statistics is given in Table 
4.2.1. The first Column gives the empirical values of deft obtained from 
the 25 simulations. The degree of variation from 1 indicates just how 
tenable was the •formula 1 standard error; the empirical values range from 
0.66 to 1.14 Which indicates that the non-normal nature of the distributions 
of the variables is having considerable influence on the sampling errors of 
the statistics. The second column gives the estimated value of deft given 
by 25 applications Of the Woodruff technique. It is striking that, except 
for the Multiple Regression Coefficients, the values in this column show 
much less variation than those in the previous column. 

Thi in itself is not altogether a problem; if one is concerned 
primarily with the quality of the approximation for each of the statistics, 
it is worrisome. However, if the aim is to arrive at a reasonable deft 
estimate for each type of statistic, it need not be a problem at all. 
Kish and Frarikcl (1970:1092) recommend exactly this latter course, and in 
the main, their advice is hereby adhered to although in some cases comment 
is made on individual statistics. The third column gives the error in the 
deft estimate relative to the empirical deft. The worst error is 27% for 
the correlation between SEX and LIKESCHL . The final column gives the error 
involved in using the 'formula' version of sampling error (that is, assuming 
that deft is 1) relative to the empirical situation. The worst errors in 
this column are, for individual statistics, considerably worse than for the 
previous colunn. 

The information contained in 4.2.1 is summarised by type of statistic 
in Table 4.2.2. The Woodruff technique is providing a slight underestimate 
of deft that is no more than 10 per cent in error. The formula 1 estimate 
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Tn I > 1 c 4.2.2 A v e ra ge^ De ft Estimates (SRS samples) 
Average »N* was 145:81) 



Statistic 



Empirical Estimated 



Percent error 
of" estimator 



Means 0:9949 0:9675 

Correlation Cbeff. 0.9048 0.8744 

Path Coefficients 1.0076 0.9287 

Mult. COrr: Cdeff: 0.7346 0.7157 



-2.8 
-3.4 
-7.8 
-2.6 



Percent error 
of formula 



0.5 
10. 5 
-U.7 
56.1 



of cleft is relatively better for the Means and the Path Coefficients arid 
relatively poorer for the Correlation Coefficients and the Multiple 
Correlation Coefficients. One way of assessing the importance of these 
errors is *-o examine the real meaning that 95 per cent confidence intervals 
would have if these erroneous deft values were used. Table 4.2.3 gives 
the probability of an incorrect statement if a two-sided 95 per cent 
confidence interval is used: the probability should be 0.050. The 
formula' standard error for Multiple Correlation Coefficients is found to 
be very conservative, but all the rest would most probably be acceptable 
to most educational researchers. 

Table 4.2.3 Probability of an Incorrect Statement About the Statistics 
in the SRS Design 

Probability of incorrect 
statement when a two-sided 
95 -o confidence interval is 
to be used. 

Statistic * formula* Woodruff estimate 

Means 0.049 0.057 

Correlation Coefficients 0:030 0.058 

Path Coefficients 0.052 0.069 

Multiple Correlation Coefficients 0 .008 0.056 
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Ta Hie 4:2:4 Ave r a £e__Bi a s and Variance Contributions to the Relative Mean 
Squa re I i r rors fur th e Statistics in the SRS Desig n 



Statistic Relative Relative Relative Mean 

Bias Variance Square Error 



Means 


0, 


.006 


0. 


,002 


0, 


.008 


Correlation Coefficients 


0. 


,020 


0, 


.004 


0, 


.025 


Path Coefficients 


0. 


,023 


0, 


.004 


0 


.028 


Multiple Correlation Coefficients 


0. 


,002 


0, 


.012 


0, 


.015 



Table 4.2.4 gives the contributions of the Bias and the Variance to 
RCLMSE for the statistics under study. The variance contribution is very 
stable for the first three statistics, not ranging above one part in a 
hundred. Thus the variance estimator is about 1 per cent as variable as the 
statistic itself. For some individual statistics the Bias component is 
smaller than the Variance component, but on average, for all three types of 
statistics, the Variance component is much smaller than the Bias component. 

For the Multiple Correlation Coefficient the situation is reversed 
with the performance of the estimator revealing quite a bit of variability, 
but on the average settling down to a good estimate. This contrary 
behaviour is echoed in the other designs. 

The proportion of times that the f t 1 ratio falls within certain 
intervals for each type Of statistic is given in Table 4.2.5. The 
appropriate namber of degrees of freedom in 145 which is approximated by 
the entries for infinite degrees of freedom in Table 4.1.1. The results 
are tolerably close to the theoretically correct proportions except for 
the Path Coefficients which seem slightly more spread out than a true 
t-distribution . 



Table 4.2.5 Proportion of Times that *t t — Ratio— Fa IXs- With in Selected 
Intervals (SRS samples) 



Statistic 


t 2.576 


tl .960 




1 .645 


Means : 


0.992 


0.936 


0 


.872 


Correlation Coefficients 


0.984 


0.944 


0 


.896 


Path Coefficients 


0.978 


0.933 


0 


.853 


Multiple Correlation Coefficients 


0.975 


0.947 


0 


:893 
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Tab 1 e 4.3:1 Average Heft Estimates (SCL samples ) 
Average 'N' was 145.00 



Stat ist i c 



Means 

Correlation Coefficients 
Path Coefficients 



Empirical Est imated 



1.4973 
1 .0098 
0.9998 



Multiple Correlation Coefficients 0.6782 



1.8113 
0.9676 
1.0022 
2.6529 



Percent error 
of estimator 

21 .0 
-4.2 
0.2 
291 .2 



4 . 3 Results for the Clustered Designs: SCL and CLS 

As these two designs are identical except for the relative sizes and 
nature of the clusters, their results are best considered together. Deft 
estimates for the two designs are listed in Tables 4.3.1 and 4.3.2: one 
is immediately struck by the huge overestimate for the Multiple Correlation 
Coefficients. The other statistics seem to be reasonably well estimated. 
The probabilities given in Table 4.3.3 are, except for the Multiple 
Correlation Coefficients just a little worse than those for the SRS design. 
Note that the basis of the calculation of these probabilities is different 
from that used for the previous design as there is now only five degrees of 
freedom involved in the variance estimatijn formula. 

One way of considering these results is to calculate the 'effective 
sample size' for the two designs (Kish, 1965:259). This is the size of a 
simple random sample over the same variable which would give standard 
errors of the same size as were found here. Ross (1976:8) has given an 
approximate formula for the effective sample size in the case of the mean. 
If the population size is large compared to the sample size n, then the 
effective sample size n* is given by 



Using this formula, the effective sample size for the Means in the SCL 
design is approximately 65, artd for the CLS design, it is approximately 30. 
This certainly provides grounds for explaining the lowered pprfprj^nce of? 
* \.4vc' ^fot>G-riS^ t estimator in the case of Means. Unfortunately no such formula 
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Table 4.3.2 Average Def t Estimates (CLS samples) 
Average f N' was 144. 96 



















Percent error 


Statistic 


Empirical 


Estimated 


of estimator 


Means 


2.2068 


2.4424 


10.7 


Correlation Coefficients 


1.2173 


1:1047 


-9.3 


Path Coefficients 


2 .1664 


1.0739 


-8:0 


Multiple Correlation Coefficients 


1.1080 


3.4081 


207.(5 



is available for the other statistics, although one might speculate that 
n* for the more complicated statistics would be closely related to n* for 
the means. If this is true then perhaps an explanation could be put forwar 
for the poor behaviour of the estimator in the case of the Multiple 
Correlation Coefficients oh the grounds that, with an effective sample size 
of 65 or 30, Multiple Correlation Coefficients themselves have little 
meaning or stability, and hence, the calculation of sampling errors is not 
warranted. 

The bias arid variance contributions to RELMSE are given in Table 4.3.4 
The situation as for defts is reflected here: the results for the 
statistics other than Multiple Correlation Coefficients are reasonable but 
not So good as for the SRS design, and they are generally similar for both 

Table 4.3.3 Probability of an Incorrect Statement About the Statistics 
in the SCL and CLS Designs 



Statistic 



Probability of incorrect 
statement when a two-sided 
95% confidence interval is 
to be used 

SCt design CLS design 



Means 

Correlation Coefficients 

Path Coefficients 

Multiple Correlation Coefficients 



0.027 
0.057 
0.050 
<<0.001 



0.036 
0.067 
0.064 
<<0.001 
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Tub 1 c- 4 ; 3 ; 4 !i i a is and Variance Contributions to the Relative Mean Square 
ilrror for the Statistics in the SCL and CLS Designs 



Stat istic 


Relative 
Bias 


Relative 
Variance 


Relative Mean 
Square Error 


SCL Means 


0.111 


0.165 


0.2 75 


CLS Means 


0.030 


0.094 


0.125 


SCL Correlation Coefficients 


0.020 


0.094 


0.114 


CLS Correl at ion Coefficients 


0.021 


0.100 


0.121 


SO, Path Coefficients 


0.013 


0.098 


0.111 


CLS Path Coefficients 


0.011 


0.096 


0.107 


SCL Multiple Correlation 


12.6 


3.8 


16.2 



Coefficients 



CLS Multiple Correlation 7.8 2.5 10.2 

Coefficients 



designs. The size of the IU:LMSn for the Multiple Correlation Coefficients 
implies that no credence could be given to the values obtained. 

Table 4.3.5 gives the proportion of times that the 't 1 ratio falls 
within certain intervals for each type of statistic. The appropriate 
number of degrees of freedom is 5 and the theoretically correct proportions 
are given in Table 4.1.1. The Multiple Correlation Coefficients do not 



Table 4 . 3 . S Proportion Of Times that 't' Ratio Falls 4Vi£h4n-Selec-tc<i 
Intervals (SCL and CLS samples) 



Stat ist ic 


±2 


.576 


+1 


.960 


±1 


.645 


SCL Means 


0 


.976 


0 


.904 


0 


.848 


eLS Means 


0. 


;968 


0 


.920 


0 


.888 


SCL Correlation Coefficiencs 


0 


.952 


0 


.904 


0, 


.840 


CLS Correlation Coefficients 


0 


.932 


0 


.856 


0, 


.796 


Set Path Coefficients 


0. 


.960 


0 


.916 


0, 


.880 


CLS Path Coefficients 


0. 


.960 


0, 


.876 


0. 


,804 


SCL Multiple Correlation Coefficients 


0. 


,880 


0, 


.856 


0. 


,827 


CLS Multiple Correlation Coefficients 


0. 


,920 


0. 


,880 


0. 


,867 
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Table 4.4.1 Average O eft Estimates (STR samples) 
Average 'N ' was i«J4 .56 



Stati sties 
Means 

Correlation Coefficients 

Path Coefficients 

Multiple Correlation Coefficients 



Percent error 
Empirical Estimated of estimator 

0.7419 0.5629 -24.1 

0.7700 0.5120 -33.5 

0.8532 0.5446 -36.2 

0.7203 0.6527 -9.4 



seem quite so disastrous in this table, bat in fact the averaging process 
has; concealed three extreme results. The other statistics seem to be 
giving a reasonable approximation to a ! t * distribution with the case of 
the Correlation Coefficients in the CtS design being more spread out than 
the rest. 

4.4 Results for the Stratified Samples: STR and WTD 

The deft estimates for both these designs are given in Table 4.4.1 and 
4.4.2. 

For all cases but one the Woodruff estimator is considerably lower 
than one would wish. When this is converted to a probability statement in 
Table 4.4.3 the interpretation is clear. With the possible exception of 
Multiple Correlation Coefficients, the Woodruff estimator is considerably 
biased. These calculations were carried out on the assumption that the 
appropriate number of degrees of freedom was the number of samples cases 
minus the number of strata; this is the way that Prankel calculated degrees 
of freedom in his study (Frankel, 1971:79). He expressed the situation as 
'the hypothesized degrees of freedom are H, the number of strata . . ' which, 
as he was working with only two Cases per stratum Works out to the same as 
the usual formula. Suppose however that the quoted hypothesis were correct 
no matter how many cases there were in each stratum. If this were true, 
then the ; labilities would have to be recalculated on the basis of only 
three degrees of freedom. This has been done and the results are shown in 
parenthesis beside the original figures in Table 4.4.3. These latter 
results are more reasonable than the former, but are still not very 
encouraging . 
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Table 4.4.2 Avernge Deft Estimates (WTD samples) 
Average 'N' was 140. 06 



Percent error 

Statistics Empirical Estimated of estimator 



Means 


2 


.6408 


1 .0809 


-59 


. 1 


Correlation Coefficients 


1 


.4477 


1.0084 


-30 


.3 


Path Coefficients 


1 


.^680 


1.0428 


-29 


.0 


Multiple Correlation Coefficients 


1 


.2249 


1 .4317 


16 


.9 



An alternative exploration of these poor STR results is to consider 
the number of cases used to estimate the U-statistics within each stratum. 
Table 3.2.3 indicates that 103 cases from stratum 1 were used, 34 from 
stratum 2, and only 13 from stratum 3. When using the Woodruff-Causey 
program in its stratified mode, separate estimates of all the derivatives 
ere made for each stratum for each relevant variatei There is only one 
such variate for each of the Means, but there are five for each of the 
Correlation Coefficients and up to 20 for the Path Coefficients and 
Multiple Regression Coefficients. It would seem a dubious practice to 
calculate 20 derivatives from as few as 13, or even 34, cases. One solution 
to this problem would be to run the program in its population mode, making 
appropriate corrections to the variance subroutine. 



Tab 1 e 4.4.3 Probability of an Incorrect: Statement About the Statistics 

i n t*M i f* 5^TR a ti H WTT ) H p^ t fr tl ^ 



Statistic 



Probability of incorrect 
statement when a two-sided 95% 
confidence interval is to be used 



STR design 



WTD design 



Means 

Correlation Coefficients 

Path Coefficients 

Multiple Correlation Coefficients 



0.137 (0.095) 

0;i93 (0.125) 

0.211 (0.135) 

0.076 (0.063) 



0.422 (0.283) 

0.173 (0.113) 

0.164 (0.109) 

0.022 (0.034) 



Note: Results calculated on 3 degress of freedom are in parentheses. 
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Tab 1 e 4.4.4 lii us ami -V a riancc Con t ri butions to the Re 1 at ive Mean Square 
Error for the Statistics in the STR and WTD Designs 





Relative 


Relative 


Relative Mean 


Statistic 


Bias 


Variance 


Square Error 


STR Means 


0 . 069 


0.001 


0.0:1 


WTD Means 


0.308 


0.003 


0.311 


STR Correlation Coefficients 


0.116 


0.003 


0 .119 


WTD Correlation Coefficients 


0.102 


0.008 


0.110 


STR Path Coefficients 


0.132 


0.004 


0 . 1 35 


WTD Path Coefficients 


0.088 


0.012 


0.100 


STR Multiple Correlation 


0.297 


0.091 


0.388 



Coefficients 

WTD Multiple Correlation 0.S1O 0.354 0.865 



Coefficients 



For the WTD sample design there were 50 cases For each stratum. This 
may well be insufficient for good results. The effect of the weighting 
process On the Woodruff -Causey program may also be quite negative. However, 
t* evidence is insufficient to make any firm conclusions. 

The relative contributions of bias and variance to RELMSE are given 
in Table 4.4.4. The variance contribution, except for the Multiple 
Correlation Coefficients, conform to the pattern of the SRS sample, whilst 
the bias contributions are quite uniformly high. The high variance 
contribution for the Multiple Correlation Coefficients is an interesting 
counterpoint to the relative accuracy of the deft estimates. 

The proportion of times that the ■t' ratio falls within selected limits 
is given in Table 4.4.5. The STR results here bear out the speculation that 
the appropriate number of degrees of freedom cbUld well be as low as three. 
There seems to be no recognizable pattern to the WTD results. Once again 
the Multiple Correlation Coefficients successfully avoid fitting what 
little pattern does emerge here. 

The poor results for multiple correlation coefficients were not 
unexpected. The simulation study by Frankel (19 7 1) also produced poor 
sampling error estimates for all three single-sample techniques under 
investigation. In a later paper, Kish and Frankel (1974) attribute 
this poor performance to the problem of using the multiple correlation 
coefficient with multinomial data (Kish and Frankel, 1974:19 and 35). 
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Table 4.4.5 Proportion of Times that 't j Ratid -Ea-145 -Within - Selected 



Intervals (STR and WTD Designs) 



Statistic 


+2, 


.576 


+1 


.960 


il 


.645 


STR Means 


0 


.952 


0 


.912 


0 


.808 


WTD Means 


0 


.696 


0 


.600 


0 


.560 


STR Correlation Coefficients 


0 


.912 


0 


.812 


0 


.712 


WTD Correlation Coefficients 


0 


.876 


0 .812 


0 


.732 


STR Path Coefficients 


0. 


,880 


0 


.764 


0 


.680 


WTD Path Coefficients 


0, 


.916 


0 


.809 


0 


.729 


STR Multiple Correlation Coefficients 


0, 


.760 


0 , 


.733 


0 


.707 


WTD Multiple Correlation Coefficients 


0 


.880 


0. 


.787 


0 


.747 



4 .5 Com parison with other Single-Sample Techniques 

Ross (197(>:46-50) used two other single-sample techniques to estimate 
sampling errors. For the CLS design he used a Jackknife technique. The 
results are given in Table 4.5.1. These results should be treated with 
caution as they are derived from only one example of the CLS design. 
On comparing the percent errors in deft with those found for the Woodruff 
technique, the Woodruff technique appears perhaps just a little Superior. 
Turning to the results for the WTD design in Table 4.5.2, the results for 
both techniques are so poor that comparison is not rewarding. 



Table 4:5.] Results of Application of the Jackknife to One. Ex ample 
of the CLS Design: Deft Est i mates 



Statistic 
Means 

Correlation Coefficients 

Path Coefficients 

Multiple Correlation Coefficients 



Percent Error 
Empirical Estimated of Estimator 

2.80 3.09 10.4 

1.53 1.63 6.5 

1.47 1.53 4.1 

1.31 1.44 9.9 



(after Ross, 1976:47-50) 
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to a n tj x ample o f the WTD Design: Deft Estimates 



Statistic 



Percent Error 
Empirical Estimated of Estimator 



Means 



n ;89 



4;12 



42.6 



Correlation Coefficients 



i.85 



1 .66 



10.3 



Path Coefficients 



1 .73 



1 .63 



-5.8 



Multiple Correlation Coefficients 



2;14 



1 .20 



43.9 



(after Ross, 1976:47-50) 



4 .6 Sammary 

The Woodruff-Causey program has been found to give accurate and stable 
estimates of the statistics in the SRS, SCL and CLS sample design, with 
the exception of the Multiple Correlation Coefficients in the two 
clustered designs. This exception is troublesome as educational researchers 
would usually not have the means of checking that the sampling errors 
generated by the program had not * inflated' as they did in this case. 

The results for the stratified designs were riot so encouraging 
although the fact that most of the estimators were quite stable leads one 
to suspect that it may be possible to arrive at some bias correction factor 
with further work. The poor results all occurred in cases where there was 
some support for the idea that the samples sizes may h»we been unreasonably 
small. This raises the point that this technique is not a way of 
compensating for inferior sample design. If anything, accurate sampling 
error estimation for higher-order statistics requires bet, Lev samples than 
those found adequate to estimate the first-order statistics. 
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CHAPTER 5 



CONCLUSION 

In this study an empirical sampling approach has been used to assess the 
accuracy of ah approximation technique for the estimation of sampling 
errors in several sampling situations commonly used by educational 
researchers. The investigation was limited to four types, of statistics 
used in correlational and regression studies - the mean, correlation 
coefficient, path coefficient and the multiple correlation coefficient. 
When applied to the simple random sample situation and the clustered 
designs, the technique provided Useful estimates for all the statistics 
except for the multiple correlation coefficient; the problem of sampling 
error estimation for this statistic has been noted in previous research 
(Kish arid Frarikel , 1974:55): The quality of the estimates declined 
considerably however for the stratified designs; this leads to speculation 
that the technique might only be reliable in cases where the minimum size 
of the strata is reasonably high. 

Table 5.1 gives some indication of the importance of finding a 
successful solution, or at least an arsenal of strategies to cope with, 
the problem of estimating sampling error. Here are displayed the prob- 
abilities of ail incorrect statement under a 95 per cent confidence interval, 
which would hold if the design factor were to be ignored: note that in 
such a case the researcher has assumed that simple random sampling gives 
an adequate approximation to the sample design which was employed and hence, 
that all these probabilities are not too far from 0.050: Patently, any 
inferences made under these assumptions will be entirely untenable for the 
SCL, CLS and WTD designs, whilst for the SRS arid STR designs , the simple 
random sample assumption has led to rather conservative confidence intervals. 

The Woodruff -Causey program has been shown here to provide a signifi- 
cant improvement on this performance for cases where the effective sample 
size is not too small. The program cart give an estimate of the sampling 
error for any statistic which can be expressed as a Fortran subroutine; 
the user need only supply this subroutine and, depending on the circumstances 
a subroutine to estimate variance and a few data-manipulation subroutines. 
Hot* more standard situations several less flexible but less demanding 
programs (which were mentioned in Section 2.6) are now available. 
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Table 5 ; 1 l» ro ha hi 1 i ty of I nc orrcct Statements When the Design Fr ctor 
is Ignor ed 



Probabiiity_of incorrect statement . when a 
two -s ided 95°6 confidence interva 1 is used 



Correlation Path ._. Multiple Correlation 

Design Means Coefficients Coefficients Coefficients 

SI'S 0.05 0.03 0.05 0.01 

STU 0.03 0.04 0.04 0.04 

SCL 0.21 0:09 0.09 0.07 

CLS 0.48 0.20 0.18 0.13 

KTD 0.50 0.29 0.26 0.36 



.Note: The first row is taken, from. Table 4.2.3 of this study; the rest 
are taken from Ross (1976:39-45). 



In common with these other programs using the Taylor's series approximation, 
the Woodruff-Causey program enjoys the advantages of a relatively high 
computational speed and transparency of assumptions. However, it also 
handsomely repays the demands it makes oh the skills of the researcher with 
the marked flexibility it displays in handling diverse sampling situations 
for estimating the sampling errors of almost any statistic imaginable and 
in its adaptability to quite small computer installations. 

The results have indicated the need for further evaluation of the 
technique in situations where larger number Of cases are involved especially 
for stratified and weighted sample designs. 

Although the results of this study are only empirical estimates based 
on particular sampling schemes and for particular statistics, the pattern 
or" results is most prubauly applicable to a wide range of studies undertaken 
by educational researchers. Considering the broad range of possible sample 
designs and statistical analyses which are available to the educational 
research worker, it would seem doubtful Chat a comprehensive theoretical 
solution to the problem of sampling error will ever become available. 
However, the problem is with us now, and approximation methods such as the 
Woodruff-Causey program, if Used cautiously, have been shown in this study 
and elsewhere, to gi\e stable estimates of the often large sampling errors 
present in educational research data. 
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This study investigates the accuracy of 
the Woodruff-Causey technique for 
estimating sampling errors for complex 
statistics. The technique may be applied 
when data are collected by using multi- 
stage clustered samples. The technique 
was chosen for study because of its 
relevance to the correct; use of 
multivariate analyses in educational 
survey research. A guide to the use of the 
technique and to writing the relevant 
Fortran sub-routines is included in 
microfiche appendixes. 

The study also includes a review of the 
literature in the field. 
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APPENDIX J 
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NUMERICAL DIFFERD,TATION 

If F is the statistic under investisation using the Woodruff-esus^y 
pror.rarr. , then one of the important steps is the evaluation by numerical 
methods of the partial derivatives. 



3f(V u V, ... V r ) 

3V7 i = 1 ' ■•• r 



1 



at the expected values of the sums of the variates V^. In fact, these 
expected valuts can be evaluated only by using the actual sample values 
V j, v j ... v The expression used to find the partial derivatives is 



f ' - 
n 



3 

Si 



, v 2 .. v.«-h . . v y ) - f(vj; v. 



for 1=1 ; 2 . . T 



CD 



This is a straight -forward application of the usual dt. c *nition of a 
partial derivative: 



Sx^ h-o 2h y KX l 9 



v 2 .. v.+h vAfffVj 



. v 2 v.-h v^ 



C2) 



liic only difficulty in applying tic appro* lrut ion (1) is in choosing a 
suitable value for h. This is fojnd by considering the possible errors 
involved in the n|:pro?<i:r,ation. 

It Day he shb-n (licnrici. 1904:256) that the ciror involved 
in the approximation is 

i - 2 - 

— h*" f^'CC) (where f is used to denote the Thec\retical first 
derivative, etc) 
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where v^h < £ k Vj+h. in addition, one j.iust consider the machine- 
error in the calculation of v^+h and v^-h, which is approximately 
bounded by 2 j v.P j P, where ? » J(f M .and M is the number of 
significant figures used by the machine. When these values arc used 
to find the two cstiaatcs of V % a further error bounded approximately 
by 

is also involved (Woodruff and Causey, 19/6 : 321). For a non-zero 
partial derivative, the relative CTror is then bounded approximately 
by 



1 


£-•'• 




f 






6 


V 


1 \ 5 






V , 

1 



(3) 



Obviously, as h gets smaller the first tcr.n will decrease but, sir.ee 
P is fixed, the second tern will increase. Thus the strategy is to 
choose h as small as possible without 



becoming to large. The program uses an iterative procedure to find an 
appropriate h according to the steps outlined above and^ of course 
uses r* to approximate f The only extra problem occurs where V 
is cither zero or very near tw it: in this case thc ; iterative 
procedure is ve?y slew with the possibility that h would need to be 
very large before 



bccor.es r-ir.il 1. To circumvent this problem, f 1 is set to zero when h 
exceeds 



1000 
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APPENDIX K 



A USER'S GUIDE TO THE ft 085RU F F - CAUSE V PROGRAM 
FOR TllE COMPUTATION OF THE SAMPLING ERROR OF 
COMPLICATED ESTIMATES V 

The following user's guide has boon written in an attempt to 'soften' 
the ruther technical doc» r.cntution which accompanied the program 
(Causey, Vj 76) . Potential userb must be warned however, that only 
those with more than a beginners knowledge of Fortran should attempt 
to use the program. Although the program demands some writing of 
Fortran subroutines, the user Will find that such efforts are well- 
rewarded; for the program exhibits great flexibility not only in 
the type of sampling pi'bblera it can handle, but also in the procedure* 
it uses to solve the problem. Furthermore, in' an environment where 
particular sampling situations were the norm, it \:oaid not be difficult 
to set-up the program to handle such standard situations without 
the need for subroutine writing. The following is bared on the 
technical documentation which accompanied the program; any errors 
are, of course, the responsibility of the present author. 

Kl A W orked Exn-nnje 

vjhe example which Follows was chosen as one that would indicate 

the steps necessary to use the tfbodruf f-Crusey program, arid yet 

be simple enough to provide ah introduction 'co the technique. • 

Consequently issues such as weighting, the use of temporary storage 

spaced and the use of a user-written variance subroutine are left 

» 

to the formal, description of the program in Sections K4 and K5. 
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Kl.l The problem 

Suppose a researcher wishes to investigate how ambitious are 
young secondary students 9 , d how this might relate to their 
ethnic origins. Some data is collected consisting of the 
students 1 opinions as to their later occupations, the present 
occupations of their fathers, and the language spoken in the 
hone, the data are coded according tp Table Kl.l. The scale 
of occupational prestige is the six-point ANU scale (Broom 
et al.i 1977:112). In order to make the occupational 
prestige scale amenable to a product -moment correlation 
investigation, the occupational categories aTe transformed into 
an approxinately interval level scaled score as in Table Ki.2. 



tabic Kl.l Format of Input Data for 'A Koricev! Example' 



tar table 


Columns 


Format 


Comments 


in 


1-3 


13 


Identification number 




4 


11 


Six-point scale of occupational 
prestige 


i.xrocc 


S 


11 


Six-point scale of occupational 
prestige 




6 


Ii 


English spoken in parental home * 
I 

. A language other than English 
spoken in parental home ■ 0 

Mis sin? data « 2 
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table K1.2 The Six-Point ANU Scale of Occupational Prestige 



Occupational grouping 


Rank 


Weighted Social status score 


Professional 


r ■ , 

i 


. 662 


Managerial 


2; ' 


.' sii 


White Collar 


3 


508 


Skilled Manual 


4 


485 


Semi -ski lied Minual 


5 


421 


Unskilled Manual 


6 


418 


Missing Data 


7 





An index of ajnbition is foraed in the following way. If and 
f^ are & student's scores on EXPOCC and FOCCUP respectively, 
then define a. measure of ambition as 



CD 



Then if is the student's score on EN'SPKIi>E , find the product 
moment correlation ar 



ae 



1 V i-1 > ■ i*i / 



where n is the number* of c&sys in <*He 5 



£ ia the average of ihb a. 



and' - - e is the average of the e^. 



(2) 



5aapla, ,. - -,. . A 



For this data, the moat: of the a. is found to be 56,24 and. 
the correlation between the a~ and' the i, is -0. 1054. 



The sampling; sche»e used to collect the data was a simple 
random sample' of 600 cases with replacement, so the usual' 



SS 
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estimator of sampling error can be used, that is, 

— ' joi the average of the a;* \ 

/n .• • ; . : : .; 1 ;< J; 

■i- for the correlation coefficient 

where s is the standard deviation of the a^ 
within the sample. 

For the data, these take the /a lues 3.95527 and 0.0436021 
respectively, when all cases With *ny missing values are 
deleted. Thus cHe correlation between Janouage spoken at 
home and this index of ambition would seen to be weak but 
non-rfindom, at least at a 95 per cent confidence level* 
However, she researcher is, quite understandably, concerned 
with the use of error estimates which involve assumptions 
of normality when one of the variables is clearly not normally 
distributed. '. The Woodruff-Causey program can be used to 
clarity the sit£N*".ion* . 

1 v ih^ the problem 
The T,s:^graa is capable of solving this problem in a number of 
^perficlsMy different ways. The actt a n&nipulations of th.% 
data will >?> fch* samfr iti eachr possible \v.rangencnt, but thd 
ways in wr.Lc''"»;Ke contra information ahY- ehc data are fed 
tc ?:< will dif^r. markedly. . The prograa will always 
R>ei z±r ■ _u i^g types bt' information in s*%«: way, 

* : • • :; tr^va.-it>;'.tii« sacpio, and certain . 

-'■ . iriftirrAijUvi .:v : ,»£ ^ash stratuaf.^.-.: 
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2 The number of functions to be investigated, and a 
•formula 1 , in the form of a Fortran segment fo+ each* 

3 The number of variatcs involved in the investigation, 
ah indicator of which variatcs are relevant to which 
function, .md the actual data on which the investigation 
is to be nade. 

4 Auxiliary information such as extra output, tempore" 
storage, etc* 

One particular way of solving the problem is descriV^-* 
A solution starts with the 'Problem Card 1 . 

the program interprets the information contained cn that card 
in the following way. 

1 A value of 1 in colunns 1 to 4. 

This means there is only one stratum to consider* 

2 A value of 2 in columns 5 to 8. 

This means there are two functions whose sampling 
errors are under consideration* 

3 A value of 5 in columns 9 to 12. 

This means there are five variates involved in the 
problem.. - " - - 

A value of I in coluan 13#' ' 

This means that the program must look, to subroutine; . ' 
KSTRAT for stratum information, V/. - ^ " 

5 A value of 1 in column 16. ^ ^ ' m k 

; This means, that the program mujt look to subroutine - . 
— Y HISPUT for the data. - 



6 A value of 1 in column 20. 

This means that certain information concerning each 
derivative is to be printed out*' 

7 A value or 1 in column 35. 

This means that not all the variates are involved in 
all the functions. 

One important point must bo made, the Fortran function F 
must calculate the functions linger investigation using hot 
the individual values of the vzi^CgLZcc, but the suns over the 
entire population of each of the vc2*uxte^» The distinction 
is important* the variables are the measures which are under 
in% gatibri whilst th* variates aii> the variables plus 
certain transformation of the Variables Which WAI1 be needed 
in the calculation of the function* Thus* in this case the 
variables are as given in Tabic Kl.I. but the varices, which 
are the values to be read into the progran, are qui^e different* 
In order to calculate the two functions, the following five 
sums arc needed: 

£ c t . £v S a i e i 

Hence, for each case the input data must be 

c i« c i» V a i« a i e i 

tn general there will be more variates than variables. The 
composition of this list of variates is hot unique* For 
instance, in this case it would have been quite possible to 
write a Fortran function which calculated the mean ami 
correlation coefficient in terns of the following set of 
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variates: 

■ 

V e t; V f i» V a i e i 

Having described how the data is to be treated on this 
probl^a card, the user ir.ust then write several Fortrrji 
se-r.cnt.%; always a Main Proj:rj^ ana a function, F, and, if 
the user has so indicated on the Pr\»,!cT. Card, any others 
which are necessary. 



The Main Pro^ran . This is used to stare the program 
executing, to reserve dimensional array space and cordon 



c mzm Durrr 

SIHEKSICil iXi3J,UICo03*5> 

COIiHyft/CCnny/yt 

CariKOK/CGHnSVJiiii. 

Rj^O 

i*i 

KlKi±,2) 

2 FesaaTc • sa;;?lc ViRIAmC- Fun: Iui;KV ex^ts-i 

DO 5? 

£e«tiJ5 f IO> llXdl), : 1*1 r 3> 

IFUXU) :£S.7# PD TO SO 1 
IF<iX(2).£G./i GO TO CO > 
ItCXHSKES;*?) GO 70 50 ) 
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blocks for later subroutines, to carry out any calculations 
which need be performed only once, and to call the first of 
the supplied subroutines, PREPAR. 

Note that data cases with pissing values arc eliminated 
entire/ from the calculations. This is necessary as at a 
later stage a linear combination of the variatcs is to be 
calculated for each case (the result of this operation is 
called a 'tl-stacistic 1 ) . 

Subroutine WINPUT . This Wis requested by the fifth entTy 
on the Problem Card. Its function is to supply the five 
variatcs, one case at a tine, to PREPAR. In this example 
the variates have been placed in the Common block CO?«Mff by 
the Main Program, so all t.iat this subroutine need do is * 
transfer the cases in the correct order back to- the calling 
subroutine through the argument W. 

SUBROUTINE UlHrUT(U,iO 

CO«MGN/eOKKU/tii,K3 

•i 3 -* 1 

00 tO J*U5 

tO CO/tTTKUE 

tut 

Figure K1.2 Subrnutir.fi WKiPUT 

Su^o ii r ire NSTO AT. This was requested by the fourth entry 
or, the Prohica Card, Its function is to supply four pieces 
of in r ornatiori fur each stvaViJs to the calling subroutine. 
The ^hfonaatior, needed is M, the number of cases in the 
stratum, "FR 1 , r.he sampling fraction, 'NT* , the total nuaber 
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of cases (which should be sec to zero if FR is supplied) and 
a quantity named IV which tells the program how to estimate >•'' 
the variance. In this case, there is only one stratum, the 
number of cases has been placed in Common block CO>Mtf by xhe 
main pro-am, the sampling fraction is 0.0 since we have 
sampling with replacement, and the value of IV is 0 which 
indicates that the program is to use its default simple random 
sampling variance formula. 

SUBROUTINE KSTRATU f i*,rR,KT,IV) : 
COriHOH/COriiWuSEU 

g«iis£i; 

FR-O.O 

H7«0 . ' 

RETUrlN 
E.tD 

Figure JC1.3 Subroutine N't. i RAT 

The Function F . This must always be supplied by the user. 
Its purpose is to calculate the functions under investigation 
using the variate sums (which i n this case are contained in 
the first argument, T) in the order in which they were 
supplied, from WI.WUT. The number of the function is 
supplied by a Common block called COMMF which is defined within 
subroutine CENVAR. This function is to be calculated using 
double precision wherever possible. 



mhtt PRECISION FUKCnGH r {7,tfT,S f HSiio, n,Nft) 

DOUBLE PRCCISiOW TCKt ) f S<K5 f K5) .RltfR) 

DOtb^ PR£c;s;5i4 F0Hi2J,V{2) 

C0nnOS/CC5r;KrVL 

COHHOH/COKHitfif 

IF (L.EO : i ) CO TO 99 

V" ' ) *I»rl jAT i ) *«T { 2 ) - 1 ( I > ^ T { t > 

99 F«£ttftt> 



Figure Ki.4 r hg Function 'F ' 

The_Sphrou^ine N5UBFV. This suhrcaUAc is requested by the 
last entry on the Problca Card, Its purpose is to inform 
the program of which variatcs are involved in the calculation 
of each function. In this case, function 1, the nean, 
involves only variatc 3, whereas function 2, the correlation 
coefficient involves all of the variates. The subroutine 
has three arguments; the first is the number of the fandton, L 
the second is the nunber of the variate, J, and the third 
is a value to be supplied, which equals two if the variate J 
i* involved in the calculation of the function L, and equals 
one if it in net involved. 



SUBROUTINE: NSo&FVtt.J.IrV) 

int. la. 2) irv«2 

IFU.E3.3) IrV*2 

RF.ItfRli 

End 



Figure K1.5 Subroutine NSUDFV 
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The Dugnv SubrouUng s. Although they are not involved in 
this cxanp.o, the Fortran compiler at many installations will 
require the following two du^my subroutines. 

SUBROUTINE VARNCE (T.M. X . S. N. F f R. Q, INV) 

RETURN 

END 

subroutine ;;subhv (i f j. ..v) 

RETURN 
END 

The first would be needed if the user Wished to supply a 
different variance fomuia froit the simple random sampling 
formula (which is a default) . The second performs a 
similar function for strata as KSUBFV performs for variates. 

The Curr lied Subroutines. The 'program 1 as sup:> icd consists 
of the three subroutines PRE PAR, CENVAR, and SWITCH. These 
need not be iltcred. 

The Printed Outfit . The printed output from this example 
is given in Figure XI. 6. 

WS£l » C26 

diiiciiSiCii LmuTS nKZ 2648 7 7 

Mftjt/A* j VcS rCR rui«CVICii S 

3 3 v. I ;0t 14G7c-G2 O.?3?iu05SH-03 

ru«CT:r..i 1 0.So:3?6;0cr02 0.;3^33?ltO2 0.3?552355E'Ol 0.7033 j?3?c!-0I 

*• l rv I V A 7 j V c 3 _ _ r u S 'Urt'CTiui- 2 

• 3 -0.671>7-w47c-v2 v«lQi£G07?u~o4 
2 3 0. i3-4o7T»c— j2 v. Ui 3007rC»-O«t 
J i -0.7|o7? ic4£-y4 0 • jvvaG*-03 

* 3 0. ?22vv-i3;t-0/ O.^33j;vifi*u0 

?u;i:T;c t i 2 -vmg^usie*** v.i773r:?vc-v2 o.42i3;we-o; o.3??7*a3ic*oa 

Figure K1.6 The » >jtnut froai ExantU-c-rUN^ 
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*NS£L' iv the number of cases left after list -wire deletion. 

The 'DIMENSION LIMITS 1 are the required minimum dimension 
sizes needed tor the arrays A, N and D respectively, of the 
Main Program. The derivative information for each function 
consists of the number of the variace involved, the number of 
iterations needed to achieve stability, the value of the 
derivative, and the final increment. 

TM? information given for each function is 

1 the i umber of the function 

2 the estimated value of the function 

3 the estimated variance of the function 

4 the. estimated standard deviation of the function 

5 the coefficient of variation (i.e. (4) divided by (*)} 

The results indicate that the standard error for the mean 
given by the usual formula was as accurate as one could 
ever expect, but then its accuracy was not in question. For 
the correlation coefficient, however, the program has revealed 
* 3,5 per cent inaccuracy in the estimate of error given 
by the usual formula. • It is Instructive morcoever that this 
inaccuracy does not alter tho earlier verdict regarding the '. 
correlation coefficient.' V : ' r i J- ■ -J : !.\ . 

, 0 ° QC -° , - v .. .^t - f V4, jP'£ ': : ^ f . -< - -* 

- • ,. - - t - t .■ • • s St . . .- < > r. b '' ^ - 

v.^ • J :?:'- . * 2* >_v.v:-:.-.-fr: &>■; -r. W:-?- 

A Fonaa l Description of the Situation . V ; . '.^ 

K2.1 Mon-St ratified Case. ' ./*^'^ : >^r : /'. "i^- 

For i certain population, consider V variates wiih totals 

' Yj, .Y- " *+l over the population which' are combined in some' 



way in a function F of all V variate totals. The goal is to 
estimate Var (F) . 

Suppose a sample is now drawn from the population and in that 
sample X^j is the value of variate i for case j. If, furthermore, 
?i is the probability »rior to inclusion) of j in the sample, 
let 

w ih ~ • 

and then 



3=1 

when n is the total number of cases in the sample, 
is an estimator of Y^ the population sum. 
This allows cne to estimate 



F » F(Y X , Y 2 ... V^p with 
F « F(Y t , V 2 ... Y y ). 
The first-order Taylor Expansion of F about F is then 



F 2 • F.£ Di (?. - Y) 



i*l 



where is the derivative 



|£ evaluate! at (Y^ \ £ ... Y y ) 



Var (F) is approximated by Var (P) which is then approximated 
by Var (F 2 ) . 
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The variance of F given by 



v v 

a varc E D iE w ij j 

i*i x»i j*i 



n 



j = l i«l 



Var( ]C U j ) 
3=1 



whc 



reu i -£ D i w ij 3 



These IT •s are called 'U-statistics 1 . 

The variance of this total is found in the same way as one 
Would find the variance of each considering the original 
-sampling scheme. Where the sampling scheme was simple random 
sampling the appropriate variance estimator is 

Var(V* U ) - ~ - 

' n - 1 

Where f is the sampling fraction 
and u is the mean •U-statistic* . 

This formula may be chosen as a default in the program. 
The calculations may be performed using the weighted W's or 
the unweighted X's. 
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K2.2 Independent Strata 

If the sampling within each stratum is independent of the 
rest, the above process may be repeated within each stratum. 
Let H be the number of strata. Use the subscript 'h« to 
denote that a certain variate or statistic pertains only to 
stratum , h , ( and the abo^e formulae may be rewritten to 
describe the application of the process to a stratified 
situation. The variance estimation formula for sit-ple 
random sampling within each stratum becomes 



A (1 £ (u jh - a h) 



2 

jh 



K3 Formal Descriptioiuo^-^thc-Use-^f the Prog ram 
the required inputs to the program ?re of three types: first a 
'Problem Card' tells the program the type of problem under 
consideration mid the way in which it is to be handled; second, 
a main program and a series of subroutines must be included with 
thv source deck; third, the data must be provided in the various 
ways specified in the Problem Card. 

K3.1 The P rob leg Card 

The information concerning the type of pioblcm to be 
investigated and the way that the user wishcj it to be handled 
is supplied to the program in terms of numbers punched on the 
Probieo Card. This card should be left blank unless otherwis' 
indicated. There are seventeen such numbers and the user will 
be referred to the information immediately below by reference 
to the names •PCI'. «PC2» ... «PC17' which indicate their 
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sequence oh the Problem Card* 

X3.1.I A Column -by -Column Commentary. 

PCI Title: Number of strata Columns: 1-4 Format: 14 

Comments: this indicates the number, H, of independent 
strata. If there arc no strata, then If * 1. 

PC2 Title: Nuruber of functic s Columns: 5-8 Format: 14 
Comments: This indicates the number, 6, of functions 
for wMth variance estimates are desired. 
PC3 Title: Number of variates eoiumns: 9-12 Format: 14 
Comments^ This indicates the nuw.bcrrV; of varices 
which r e input as data to calculate the 
value of the functions under consideration* 
Note that the number of variates will generally 
be greater than the number of 'variables 1 
involved in the functions under consideration* 
PC4 Title: Input Mode for Stratum Descriptors 
Column: 13 Format: II 

Comments: For each stratus, the user must supply four 
quantities. This e'ltry t-lls the program 
where to lock for them. 

If they are to be read from cards, place a *Q 9 
in this column. 

If they are to be read from an unformatted 
binary file, place a 'i' in this column, 
if they are to be provided from a user- 
written external subroutine NSTRAT, place a 
•2 # in this column. 

Details of the inputs to be provided by each v 

of these nodes are contained in Section £3*1.2. 

The Stratum De script 
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PC5 Title: • Stratum Descriptors Input File 

Columns: W-1Z Format: 12 ' ; ; -v'V- 

Comments: ' if the value of PC4 is then the user 

must here indicate c'ie inteircl unit number of 
the unfoi7n.it ted file on which resides the 
Stratum Descriptor inforratiow 
If the value of PC4 is no* *!• then these 
columns are to be l*fz blank. 
Of course tf.J user Dust ensure that this 
internal unit number is not used in any other 
part of the program. 
PC6 Title: Input hiodt for Variate Values . "■■""y^.-?' 
Columns: 16 Format: II " * :: 

Comments: For each sample unit In each stratum the user 
»ust provide the V variable values. This 
column tells the program where to find them. 
If they are to read froa cards, place a *0 9 
in t* rs column. 

If they are to be read from an unformatted 

binary file, place a in this column. 

If they a*© to be provided Jrom i user-written 



.'fy- 



v '" ^jf 



...4 



;._: T .. . ■ . : - -. 

external subroutine WINPWY place a ; , 2 t in ; thW^,.^V<>>^ 



column. 



v-: v. „y,£ 



Details of the inputs to be provided by each 



of those viodes aro contained in Section KS.1.2,.*.- - v * 



Variate Values. 



PC7 Titla: Variate Values Input File ot dumber of Format : &£$M\ m ii&M\ 
Cards Columns: 17-18 Format: 12 -, s 1 - w " \ 
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Consents: if tu« value of Pe6 is '0 1 then the user 

must here indicate the number of cardr heeded 

to givo the format of the card or cards 

from which the V values are to be read for each 

sample unit in each stratum. 

If the value of PC6 is then the user must 

here indicate the internal unit number of the 

unformatted binary file on which resides the 

Variate Values information. 

If the value e* PC6 is not '0" or then these 

columns are to be left blank. 

Of course the user must ensure that this 

internal unit number is not used in any other 

part of the program, 
PCS Title: U-Statistics Columns: 19 Format: II 

Comments: If the user desires a printout of the U-statistics 

for each case, a is placed in this column. 

Note that these are printed in the format 
10(IX. E11.6) 

Thus the output file or printer nu&t bo 

capable of receiving lines of 120 character?. 

If the U-statistics are not neederV, leave this 

column blank. 

P€9 Title: Derivatives Columns: 20 Format: II. 

Comments; tf the user desires information concerning the 

derivities of each function with respect to 

• ■ 

the appropriate variates, a 'l 9 is placed in 

this cblu.n. The information consists of: 

1 the nu^Her of the varfate Involved in the 

differentiation; 
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2 the nuober of iterations involved in the 
numerical procedure (** denotes the ■' 
maximum of 8); 

3 the value of the derivative; 

4 the final increment used in the numerical 
procedure * 

This information is provided in 45 characters 
across each line. 

If this information is not needed, leave this 
co I una blank. 
PC 10 , PC11 , P£l2» PC13 , PC14 

Collective Title: Temporary Storage Columns: 21-33 
Comments: These columns may be left blaik unless the 
amount of space needed for storage of 
dimensioned arrays, which is discussed in 
Section K3.2, is beyond the capacity of the 
particular machine being. used, if this is tho 
case, consult Section K4« 
P£i£ Title: Irrelevant Strata Column: 34 Format: II 

Corrjnents: If all variatcs are defined for all strata, 
ler.vo this column blank. 

If for some strata, certain yariates are not 
defined, place a in this column, the 
user must then provide a user-written external 
function NSUEHV which inform* the progra* of 
the appropriate paiij of va r iatc 5 and strata. 
This is described in Section IC3. 2.3. 
Note that an alternative strategy is to supply 
a vaiutf of zero for the variate in the relevant 
strata.* This will result in a loss of efficiency 
-71 
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and may also lead to execution errors, 
PC 16 Title: irrelevant Variates Columns: 35 Format: II 

Comments: If, apart from the considerations of Irrelevant 
Strata (PC1S), all Functions are defined 
using stratum totals of all of the variates, 
leave this column blank. 

If all functions are defined using papulation 
totals of all of the variates, place a '2* 
in this column. 

If so^e variatos are not involved in the 
calculation of sor*e functions, regardless of 
whether stratum totals or population totals 
are used, place a »1» in this column. 
The user must then provide a user-Written 
external subroutine NSUBFV Which informs 
the program of the appropriate action to be 
taken with respect to each pair of stratum and 
variate. This is described in Section K3.2.4. 
Note that ait alternative strategy of allowing 
the program to calculate derivatives of 
irrelevant variates will result in a loss of 
efficiency and may also lead to execution 
errors. 

Tho user must be careful at this point to 
ensure • . - , y; '* 

1 that population sums or stratuw suag ate . 
used in tho function F (see Section £3.2.2) 
to correspond with what the user has 
indicated here, ; . .l ' , Vv \ 
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2 that the conditions indicated here do not 
conflict with those indicated in subroutine 
NSUBHV (see Section K3.2.6). 
PC17 t' 'le: Preliminary Run Columns: & ' Format; II 

Comments: If the user wishes to obtain lower bounds fox 
the dimensioned arnys discussed in Section 
K3.2.1 without running tk« entire program, 
place a in this c r lumn. 
This step will often not oe necessary as ail 
that is uccded is to exceed these limits and 
still regain within the available storrge 
space, 

ICS. 1.2 The St r-ai-un- -Poser ip t ar s » For each stratum the 
following information must. provided 

1 The stratum sample sizt, N, in integer format. 

2 the stratum sampling fraction, PR in floating point 
format; this is the ratio of stratum sample size to 
total stratum size. 

3 The total stratum size, NT, in integer foraat. This 
value is used only to coapute the sampling fraction. If 
NT is set to zero, then FR will be used as the sampling 
fraction. 

4 A value, IV, which informs tho program of the mode of . 
variance computation to be employed; 

If variance computation is to be done, for the stratum 
under consideration, using the internal simple random 
sampling variance formula applied to external ly-vmight*a 
data, then set IV equal to 0. 
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If the same procedure is to be followed, but the data 
are to be inicmalUj -weighted, then set IV is equal to 1. 
if the variance computation is to be done, for the stratum 
under consideration, using a user-written. external 
subroutine VARNCJE applied to externally -weighted data, 
then set IV is equal to 2. 

If the same procedure is to be followed, but the data are 
to be inicrr.ally-voighted, then set IV is equal to 3. 

Khere no strata are involved, that iSj, Where only population 
values are under consideration, this information is to be 
provided as though for stratum one. 

Note that if IV indicates internal-weighting, all calculations 
will be in terms of variate values divided by the probability of 
selection within the relcvnt stratum. If IV indicates 
external weighting the urcr aay pre-wei^ht a*l data before they 
are input into the ^ro^r?w, or, if using an externa? subroutine 
VARNCE, the user may input unweighted data, and .make 
appropriate adjustments within the function and variance 
subroutines. 

The program finds this information in the way indicated by 
the value or PC4; If PC 4 equals iero, the information is to 
be read from cards,. The i'irit car'J oust give, in columns 1 
to 72 and in format (12A6) , the format of the H cards, one per 
stratum, providing for each of the H strata the four quantities. 
The remaining II cards containing this information then follow. 
If PC4 equals I tha four quantities' are to be read (in H 
groups of four): from the unformatted binary file with internal 
iKiJ;.- .itT-,h^t. >n>1ir.itrd by PCS. If PC4 equals 2 the four qusr«tities 
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arc to be provided by the subroutine NSTRAT which is described 



in Section K3. 2. 4. 



K3.1.3 Th e Variate Values . For each case in each stratus the 
User must provide the V variate values. These must bo in. the 
order of their relevant strata and exactly the number of cases 
as indicated by N must be pTesenu for each stratum. Furthermore 
the user should ensure that the values are weighted or 
unweighted according to the specified value of IV. (See 
previous section for a definition of U and IV). 

If PC6 equals zero the user must provide a number of cards, 
specified by PCV (maximum : 2) containing the format 
information concerning the cards from which the V values axe 
to be read for each case in each stratum. The remaining cards* 
containing all this information, then follow. 

If PC6 equals one, each set of V is to be read from an unformatted 
binary file with internal unit number indicated by PC7. If 
PC6 equals two, each set of V is to be provided by the subroutine 
WLNPuT which is described in Section 3.2.5. 

K3.!,4 Computational Efficiency , The program may be 
instructed to disregard the input of data for certain strata 
for certain variatcs (see PC15 and Section K3.2.6); it may be 
instructed to use only population sums, or to disregard certain: 
variates in the computation of certain functions (sen ?CI6 and 
Section K3.2.7). On soot systems these steps will be necessary, 
but on all systems they win enhance efficiency, -V 
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As a general role computation will always be faster when 
population sums only are used, but also the waiting of the 
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function subroutine will be easier and the numerical 
computation of derivatives will be more accurate. Thus, 
wherever possible, population suns should be used. 

It was noted in the 'Worked Ex^wple' that dxlf^rcrt sets 
of variates nay be used to calculate the i>.--e '-..n^iion. h'here 
this is the care of the smullc+z sot of variates will give the 
fastest solutions. 

K3.2 The Main Program arid the Subroutine 
The user must. always supply a main program and a double- 
precis Ion function^ in order to run the prcgrax. Several 
other subroutines will be necessary depending on the options 
specified on the Program Card. On some instillations' it Win 
be necessary to provide dumm> versions of these subroutines 
even it they arc never called. The part of the program which 
har *een * upplied consists cf thi-ee subroutines - PREPAR* 
GE?~*;;R t and SWITCH - Which must always be included in the 
scarce deck and Which are referred to collectively as the 
•core subroutines 1 . 

KJ.2.1 The Main Program . The main program performs five 
functions. 

1 It begins the operation of the program. On some "ysteas 
this is achieved by making the first lins* a •PROGRAM 1 tine; 
on others the system detects that it is a main program, 

: simply by the absence of a SUBROUTINE or FUNCTION statement 
at the beginn ng. 

2 It reserves' sufficient spaco for the dimensioned arrays 
which are to be ussd by all the subsequent subroutines. 
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The amount of spaco needed by the cere subroutine* is 
indicated below. If the space needed by these subroutines 
is not provided for in the main program, execution Will 
terminate and one of the core subroutines wil< print out 
the necessary dimension limits • If any of the user- 
written subroutines require more space than is already 
reserved^ the user must make allowances for this in the 
main program. 

3 It can, beyond what is called for in the Proclcm Card, 
provide in advance data for any of the user-written 
subroutines. These could be read into interlocking 
COMMON ureas not named CCMMF or PREPGN for use at any 
subsequent stage. 

4 It can be used to open 3 ^nc* 'close 1 files: rewind 
tapes, etc. 

5 It calls the first core subroutine PRE PAR* 

The form of the main program is: 

DIMENSION A(a) ,N(n) 
DOUBLE PRECISION D(d) 

L1L *>R£PAK (A,a,N\n,D,d) 

STOP 
END 

As previously noted, a first •PROGRAM 1 line may be necessary 
on some systems. 'A 1 is a real array, *H' is &n integer array, 
and 'D 1 is a double precision array; they are of dimension 
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a, n, and a respectively. These dimensions ore actual 

:-*'.. .»"■.■ "'t.j ■■ 
nuc<?rical values which may be found by asking a preliminary ^ ^. 

run with PCI 7 . ■» 1 (for this preliminary run they may all be set ; 

equal to, say, 100) • Alternatively, low: - bounds nay be *.'-.. 

calculated by hand as follows: 

If ho temporary storage is called for (that is, all of 
PC10, PC11, PC12, PC13, PC14 arc zero or blank) 

and if ; ..- ' '- -Cr.S. ' ■ 

H ■ number of strata : ' : '•■ '"• • --'V: 

V M number of variates ;V " ■ ^A/y > 

R * total number of sample cases for all strata 

K * number c? sample cases in the longest stratus . 

then . - • ; " \ ■ 

a s VR ♦ K ♦ H ♦ 4V :\' * : * 

n < 2H ♦ v ^ ;:> '" . :.■' / ■ 

d S VH «■ 27 ' ' \ • . :"* * . .;■ : • ;/ : ' ' " - '. ! Y;, 



Thes** bounds heed only be oxceedietf if the user has written 
subroutines which will use core dintnsloned- array space. 
If population sums are used, or if th-e optional printouts are 
not called for, or if tenpory storage on internal Units is 
called for. then much smaller dimension bounds Sijr6S"ttSrt;:0;£^ 
Exact bounds *n these cases may be found by consulting Section 
K4 



•The Froble* of Insufficient Storage- Spice 



£3.2.2 The Function p" . The user must always provide a \doi*le 



precision function F defining the!, functions under iBvestijatid&w .; -*j 

Double precision is needed hero* to ensure accuracy In \hm y ) > ' V *f° * 

numerical computation of the derivatives'* and should b« us#i . .. . *:f, , 
■ , ... . " ■ - ^ -j ••• .^• -7-.. .. . v a # v-.- .. ^ ;r yjV/^"-o^:> , " < '. 

defining F»' .> ■ ^•■^rr-S^:^^ 



to the fullest extent in 
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The form of the function F is: 

DOUBLE PRECISION FUNCTION F (T,NT, S,MS,NS,R,NR) 
DOUBLE PRECISION T(NT) ,S(MS,NS) ,RfNR) 
C0M>)dN7COMMF/L 

RJE rURN 
END 

The arrays T, S and the conmon blcck CCWIF should not be 
altered in a problem needing no £ ttporary storage* If temporary 
storage is used consult Section K4. The cosmon block contains 
L which is the i; ..iber of the function to be evaluated. The 
function is to be defined in terns of population sums which are 
supplied in array T, or stratum sums which are supplied in: 
array S, whichever api ropriate. Note that T is of 
dimension V and S - : 'tension (H,V) that is, el^csn*. 

S(I,J) will be the strata sum for stratum 1 and variate J. 
Two possible pitf ills should be noted. First, if PC1S * 1, 
then for some pairs th<s stratum sum S(I^J) does not exist; 
if it appears in function F^ its value must be set to zero. 
Secondly, for each function, and for each variable, the 
FUNCTION F must use stratum sums, population sums, or neither, 
according to what is called for by PC16 and SUBROUTINE NSUBFV. 

K3.2.3 The Subroutine VARNCE . The user need supply this 
subroutine only if, for some strata, the internally-provided 
simple. <andoa sampling formula is not appropriate. The 
subroutine is called for each stratum in order to provide the 
variance of the it-statistics. -"" 



The font of the subroutine VARNCE is 

subroutine varnce tt.m^i^fr.q.xnv) 
dimension t(M) 



RETURN 
END 



The array T consists of the U-statistics for vhe particular 
stratum under consideration. If the uStir indicated that the 
input data were unweighted, (see Section K3.1 2) the U-statistics 
will now be weighted. The othev variables are: 

M : the dimension of array T '< 
1 : the stratum under consideration 

N : the stratum size • 
FR : the str.itum sampling fraction 

INV ; an internal unit number needed only if ^ 
temporary storage is used. . - 

The variance contribution for each stratum is evaluated in 
terms of the U-statistics and placed in variable Q. 

If temporary storage has been called for in the Problem Card, .' //_ 
consult Section K4. . \ ■' f : . ; ; ■ 

K3.2.4 Subroutine KSTRAT. The user need. supply this subroutine 
only if ?C4 equals 2. The subroutine must supply the four ; 
quantities N # FR r NT and IV for each stratum (see Section ' " £ 



V 



v.i 



K3.1.2) , , 



III 
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The form of the subroutine NSTRAT is 



SUBROOTINE NSTRAT (I.N.FR.NT.IV) 



RtVJRN 
END 



The variable I holds the number of the stratum for which the 
four quantities are needed. *-*e way of achieving thi« would 
be to read in or c-lxalate the M sets of four quantities in 
the main program and transfer them to NSTRAT in a conunon block. 

K3.2.5 Subroutine KINPUT . The user need Supply this 
subroutine only " Is 2, The subroutine must supply, in 

turn/ a set of »r each case in ^sch stratum (see 

Section K3.I .5) . 



The form of the subroutine WINPUT is; 



SUBROUTINE WINPUT (W,M) 
DIMENSION W(M) 



RETURN 
END 

The dimension of array W, ~f no temporary storage is called for, 
will be V, the m caber of varices. If temporary storage is 
called for, see Section K4. One way of arranging this subroutine 

would be * / ~ead in the variate values in the main program, 

- - i» 

transfer them to WiNPuT in a conmon block, and read from this 
common block into arrf.y W the appropriate variato values of 
each case. 

81 
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C. «t e~ira integer variable, say K, included in the common 
U- .*.< could be used to keep track. of which Was the 
appropriate case simply by incrementing K by one each time 
W1NMJT is called. 

K3.2.6 Subroutine NSUDMV . the user need supply this 
subroutine only if PC15 equals one. The subroutine must 
indicate to the calling subroutine - % for ;ach pair of stratum I 
and variate J, whether variate J is def .ned in stratum 1. 

The form of subroutine .S'SUBHY is: 

SUBROUTINE NSUBHV (I, J, IHV) 



RETURN 

END . 

The subroutine :airries out its purpose by returning for- 
iVach pair of stratum I and variate J the value IHV » i if 
variate J does not appear for stratum I and IHV » 0 otherwise,, 
Note that for any pair for which IHV » 1, dummy variat^s uust 
still be supplied for each of the cases in stratum I. 

The data supplied in this subroutine «!ibuld not be found by 
reading in fresh input from unit 5. One Way of implementing 
the subroutine would be to read in the relevant data in tho 
nain program and transfer it to NSUBHV through a common block 

K3.2.7 Subroutine NSUBFV. The jser need supply this 
subroutine only if P€16 equals >. The subroutine must 
indicate to the calling subroutine, for each pair of variate 
J and function L, whether; 
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the stratum sua of variate J is involved in the 
computation of function I; ^ 

the* variate J is not involved -tt all in the computation 

"■■ * ■ ■ •■ Av 

of function t; '\i 

the population sam of variate J is involved in the 
computation of function L. 



The form of the subroutine NSUBFV is 



SUBROUTINE NSUBFV (L.J.IFV) 



return 

END . 

The value of IFV is to be zet to 

(a) *ero if case (1) above holds 

(b) on:; if case (2) above holds 
two if case (3) above holds. 

The values- of iHV should not be determined by readin^-in fresh 
input from internal unit 5. One "way of implementing this 
subroutine would be to read tho required values of IHV into 
the main program and transfer then to NSUBFV in * common 
block. ■*//',. -7. ~ '." 

K3.3 The Pats. • 
As indicated in tho previous two sections, iher* arc various 
typo ; of data to Sc communicated to the program and there ar* 
several options aboet how each is coOTuniesicf^ Thtr« aro f^ Vv '^ 
types of dat:, input: . r j,.' > 



It: 



03 



9 

ERLC 



9* 



1 The control and dcfini^^." 1 information contained on the 
Problem Card. 

2 1*he information describing each stratum. 

3 the variate values. 

Information concerning irrelevant strata and 
descriptors . 

5 Sundry constants, weights, etc. that might be needed. 

The information on thu Problem Card is always expected in 
card form (or equivalent^ oh unit five). The stratum 
information may also be provided in card form, in which case 
it must be preceded by a format card, or on an unformatted 
binary file^ or through a subroutine NSTRAT which may itself 
read cards or could alternatively read the information from a 
unit other than five (see PC4) . The variate values may s© 
read in using the same options as those for the stratum 
descriptors (see PC6) . Information concerning irrelevant strata 
and variates is not to be read in from cards by subroutines 
NSUBrV and NSUBHV. This information may >e read in from cards 
by the ba3h program and transferred to the subroutines by 
common blocks, or it could be read fron a unit other than five. 
Sundry weights, constants, etc. would be best read in from 
cards by the main program and then transferred to the relevant 
subroutines by common blocks, although it would be possible to 
read than from units other than fivei If the user does 
decide to use auxiliary units care must be takan not to use 
any units needed for temporary storage (see vtib tc ^014) 
and to rewind them where relevant. 
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Dana in card form (or equivalent ly on c:it t five) are called 
for in the followA^;; order: 

1 Any data called for by the main program (see* above) 

2 The Problem Card (see K3.1) 

3 IF PC4 ■ 0, a card giving the format for the following 
stratum descriptors. 

4 If PC4 * 0, or if NSTRAT reads cards, H cards supplying 
the stratum descriptors. 

5 If PC6 * Q ft one or t*o cards (whichever is i> icated by 
PC7) niving the format for the following variate values, 

6 If Pe6 - 0, or if WIKPUT reads cards, the user must here 
supply the cards which give the V variate values. There 
will be one or tso cards per case (as indicated by PC7) 
and the cases are to be in order by strata. 

ICS. 4 Printed Oft put 

In order, the printed output is: 

1 The. required rdnimum dimensions for arrays A, N and D 
of the main program^ based on the entries of the Program 
Card and the stratum descriptors (see K3.2.1) . If PC17 « 1, 
the program stops after printing this information. 

Then for each function: 

2 If PfP ■ 2- a line giving in for* *t ion for each computed 
derivative. These are grouped by stratum if stratum 
suns are being considered; if both stratum and 
population suA5 are being considered, the results for 
population arc given for stratum one. The printed 

as 
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information on each line is 

(a) the number of the variate involved in the differentiation 

(b) the number of iterations involved in the numerical 
procedure (•* denotes the maximum of 8) 

(c) the value of the derivative 

(d) The final increment us*d in the numerical procedure. 

If a sun: j;,r differentiation is less than 1(T 20 in absolute value, 
the dec: >tive is set to zero, and this fact is indicated in 
the primed liue. 

3 If PC8 « 1, the •U-statistics • are printed, grouped by 
function and stratum. These are printed in Ell. 6 format, 
10 to a line. 

4 The results, consisting of; 

(a) the number of the function 

(bj ths estimated value of the function 

(c) the estimated variance of the estimator of the 
function 

(d) the estimated standard deviation (i.e« the square 
root of (c)) 

(e) the coefficient of variation {i.e. the standard 
deviation divii* * by the estimated value of the 
function. 
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K4 The Problem /»f Insufficient Stcrage Space. 
In certain circumstances the amount of storage space indicated in 
Section K3.2 will be larger than that available on particular systems. 
To copo with this a scries of internal units may be used to reduce 
the ncc(;( for dimensioned array*. Ignoring possible extra 

'*ce n ;dci by the user-written subroutines, rhc upper bounds to 
o. f.a storage space arc: 

1 floating-point a < VR + K + H ♦ 4V 

2 integer n « 2H + V 

3 double precision d *' VH «■ 2V (i.e. 2d words) 
where 

H a number of strata 

V * nc<r»ber of varistes 

R » total number of s.vnpic cases for all strata 

K » number of sample cases in the largest .stratum. 

These upper bounds may be much higher than those actually needed 
in any particular situation. The u er may make a Dre.timinary run 
(set PC17 to 1), in which case the p Ingram will calculate the 
appropriate upper bounds and then stop. If these upper bounds are 
Within the available " e # the user should skip the remainder 

of this section 

* <. the calculated upper bounds are too hx^h, the user may try 
each of the following strategics in turn. 

£4 „ 1 Strategy One ~- Discard Variate Values 
Ceneral ly the most troublesome storage problen is represented 
by the terra •VR 1 in the floating-point storage space. This, 
is reduced to l if the user *cts PC10 to 1 (fornat 11) and 
^icvides for a temporary storage file on unit 20. If more 
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temporary storage space is then required; try Strategy Two. 

K4.2 Strategy Two -Reduce Numb er-qf Cases 
In order to reduce the number of cases to be transferred to 
subroutine VARNCE* the user may place a value K* into PC11 
(Format IS). This will ensure that the U-statistics are 
transferred to subroutine VARNCE in sets of K* rather than 
sets of K. A temporary storage file must be provided on 
internal unit 29, and the size of K* is left to the 
discretion of the user (it .should, of course, be smaller than K) 
This strategy reduces the second tern in the floating-point 
storage space to K*. 

If this strategy is used, the subroutine VARNCE must be altered 
to allow the reading of U-statistics from unit 29 (which 
is referred to internally as INV). The first line of the 
subroutine is: (see Section K3.2.3). 

SUBROUTINE VARNCE (T,M,I ,N, FR.Q, INV) 

In this statement, N is the total stratun size (which, for the 
largest stratum, corresponds to K) , M is the size of the array 
T which holds the U-statistics (and Which will equal K* if 
K^N, and Nl otherwise) - t and INV is the unit on which the 
U-statistics are stored. Trie subroutine must be written so 
as to expect to find the U-statistics in array T if M £ N, 
or to re*d the U-statistics from unit INV in A ♦ 1 sets of M, 
it M < N , whero 

*-H 

s the largest integer smaller thanjtt * m| 
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B ■ N - MA values rather than M values. This is a rather 
complicated procedure but a scries of statements such as the 
following will accomplish it. 

SUBROUTINE VAftNCE (T,M f I,N,F*,Q,INV) 
DIMENSION T(M) 
L »■ 0 



ND * (N - t)/M + 1 
DO 2 J » 1,ND 
IF (ND.EQ.l) GO TO 1 
' READ (INV) T 
1 DO H « 1,M 
L ■ L + 1 

IF (J..GT.N) GO TO 3 



2 CONTINUE 

3 Q « 

RETURN 
END 

K4.3 Strategy- Three - Discard Stratum Variate Sum* 
In order to reduce the number of variate suas transferred to 
func/cion F *n dimchsiohed arrays, the user may set PC12 to 1, 
and provide a temporary storage file on internal unit 21. This 
will reduce the first tern in the double -precision storage 
space, VH to 1. Note that if only population sicis ire to be 
consU jred for all varintcs (i.e. if either PC16 * 2, or PCI6 * 
1 and all values of IHV are cither 1 or 2) then this strategy 
is irrelevant. 



Initialize variables etc. 



Calculate the variance 
contribution frci each 
set of H U-5t)tistics 



Put total variance Into C 



for stratus I aro contained in array R. in the first step 
they should be manipulated in the appropriate way and the 
result! stored elsewhere. 
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If this strategy is employed the function F must be altered 
to allow for the reading of variate totals from an internal 
unit. The first two lines of the function must be 

DOUBLE PRECISION FUNCTION F (T,NT,S,MS,NS,R,N; j 
C0WK)N/C0MMF/L ,INP , I 

The variable • L 1 is, as before, the number of the function to be 
evaluated. The variable • !• is the number of the stratum 
which his had its variate totals adjusted for the purynse of 
numerical differentiation; the values of these "uriate totals 
are contained in array R (of dimension V) . '« 

The variable 'INV contains the number of *he internal unit on 
which all the non-adjusted variate totals reside; these are 
in H groups of V. In order to uridcr*-tand the procedure 
described below the user must realize that the function F. is 
used for two different purposes; first, it is called several 
times in the process of numerical differentiation; zeeond, it 
is called to provide an cstiuate of the function which is 
printed as part of the output. The .nformation regarding 
which purpose is appropriate is conveyed to the function using 
variable •I 1 : 

1 if I > 0, the function is needed as part of the numerical 

differentiation process . j 

i if I ■ 0, the junction heeded for the final estimate. 
Ca5e- \-i I > 0. In this case the valuer of the V variate *u»s 



The pointer on the internal unit INV will no« be located at 
the first variatc sura for the first case in stratun I ♦ i. 
The set of variatc suras for stratum I ♦ 1 are then read in 
from unit INV, the appropriate nuwipalat ion carried out, and 
the results stored elsewhere. This process is repeated for 
stratum I ♦ 2, I ♦ 3 and so on up to stratcra H. Unit INV is 
then rewound and the sar.e process repeated for strata 1, 2 ... 
I - 1. Note that the sec of variate susis for stratum I is nc^ 
to be read from unit IMP. 



Case 2: 



In this case the H sets of variate sane are 



to be read from unit INP and handled appropriately. The 
pointer on unit INP will be pointing at the first case of 
stratum I . 



The following set of statements would w 2 one way of implementing 
the above procedure. For simplicity it is assumed that there is 
only one function to be evaluated by function F; h represents 
the number of strata. 



DOUBLE PRECISION FUNCTION F (T.NT.S.MS.NS.R.NR) 
DOUBLE PRECISION T(NT} .S(MS,SS) ,R(NR) 
COMMaN/CO.^tF/L.INP.I 
IF (I.EQ.O) CO TO 1 



IF (I.EQ h) CO TO 3 

ip ■ i • i 
no 2 K • i?,h 

RE AO (INP) R 



Vjriitt totals for stritui I 
iff contained io array B* 
Stlfwt wnlpuluioa* art 
carried out and tht rtsvlta 
ittrid tlsewftef© 



Viritti totals *or strat* I • U 

stritui ftt * IU#« r*iitailattd* 
a*d tft rtsttUi stertd. 
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IF (I jLB.I) G0 4 TO 5 ; 

V DO .4 K » 1,11' „ .,„ 
( READ <INP) S V 



CONTINUE 

P m 

END 



. i «- • .* * ■ 



Viflatt totii: fci; strata 1« 2 
... I — 1 arc read in, or.t 
stratum it a tie*, tanipuiated, 
and the results stored. 



CalruUta f usii>) the stored 
values of the variatt totals 
/or all strata 
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K4.4 Strategy Four - Delete Stratum Descriptors ' V . 

In order to eliminate che need to store stratum descriptors 
in dimensioned arrays, the user may place a 2 in PCI 3 and 
provide temporary storage ih unit 19. This will have tho 
effect of replacing the tern *H» in the floating-point total by 
1, and of replacing the term 2H in the integer total by 2. 



K4.5 Strategy Five - The Last Resort ' ? 
If after resorting to ail the above measures there* is still 
insufficient storage spaco, tM" user may enter in PC14 the 
valuer V* (format 15) which will cause, each $et of V variates 
to be dealt within sets of size V*. The user will need to 
provide temporary storage on units 22 through 27 and also 28 

if both . S. f^*' V • 



(a) population sums are^used, and 



Mrs 



-(b) not all variables are defined for «lX;str*^J^v?^ ^ : >?>^^v^rj 



this Will reduce the terms 4V, V and 2V in the Heating-point/ ?:.Vv: 
, - integer # and dcubie precision 
respectively. 



:ision totals to 4V*, V*'«nd 27* '- : y'¥* '*> 
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the implementation of this strategy involves the alteration 
of subroutine WIN PUT and of function F. first, some notation 
is necessary; let V and V* be as above, then let 



A « 



« the largest integer smaller than V * V* 



AY 1 



fA if B = 0 

(a ♦ 1 if E > 0 



The subroutine KINPUT must be written so as to supply to the 
calling program, for each case, C subsets of V* varinteSi with 
only B variates contained in the last subset if B > 0. The 
modifications to subroutine WINPUT are basically the sane as those 
made for subroutine VARNCE in Strategy Two (see K3.2J5 and K4.2) . 

The alterations to subroutine F differ depending upon whether 

4 

population sums or stratum sues are involved. In either case 
the user mt-st provide a common statement, 

COftSaN/CCfrNF/ L , ISP % I , INT, S 

If population sums are involved, the array of population sums 
is given in C sets of V*. Just as for Strategy Four^ the variable 
I determines the use to which F is to be put; If I * 0 the 
population sums are -read from unit INT in € sets of V* and aXl 
are used to calculate F. If I > 0, then the array T will 
contain the V* population sums for set M; these are manipulated 
and then, as in Strategy Four, the remaining sets M ♦ 1, M * 2 
... C arc read in and manipulated, the unit INT is rewound, and 
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m 



ft* 



•ft 



Jl- .... „ 



er|c 



the sets 1. 2 



M 



1 are Also read in and manipulated, the 



Fortran example in K4.3 may be modified as follows to *«plt : iaent 

_.. .. _ ' . ., .. \ 0 . 

this procedure. Let c be the number of subsets (i.e. c C ) . 



DOUBLE PRECISION FUNCTION F(T,Nr,S,MS,NS,R f NR) 
DOUBLE PRECISION T(NT) »5(MS,N'S) ,R(NR) 
CO^IMON/COMMF/L, INP , I , INT ,M 
IF (I.EQ.O) GO TO 1 

Manipulate the viflate 
sur$ for subset K 

IF (M.EQ.c) GO TO 4 

ip - m + i 

GO TO 7. 

1 IP * I + 1 

2 DC 3 K = IP,c 
UEXd (I NT) T 

| Hanioulate the variate 

5 surs fcr subsets H* 1^ 

• j H ♦ i % ... e 

3 CONTINUE 

4 PJEWIND INT 

i¥ (I.EQ..0.OR.M.EQ.1) GO TO 6 
IP « M - 1 
IK3 5 K = 1,IP 
READ (INT) T 

tiinioulatft th& variant 
su*s_ for subsets 1, 2 
... H - I 

5 CONTINUE 

6 F » .......... Calculate T 

RETURN 
END 



If stratum suns arc involved, a procedure combining both the 
alterations to K described in Strategy Threc^ and thoso above 
for the case of population sums must be implemented. Tho 
procedure is described below. ; 
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The array of stratum suras is given in H sets of V (one for 
each stratum) ,. each of these broken into C subsets of V*. For 
I > G the current values of Ft are for set I, subset M. The 
user may use these if desired, and (whatever the value of I) 
must: 

1 for M < C, read from INP, sets of V* values, one at a time, 
and do the corresponding calculations for subsets M «■ 1 , 

M ♦ 2 ... C for set I ; 

2 for I < H, read C subsets and do likewise for each subset, 
for each of the strata I «■ 1, I «■ 2 .. . H; 

3 Rewind INP 

4 for I > 1, read C subsets and do likewise for each subset,, 
for each of strata 1^2 1-1; 

5 for I > 0 and M > I read subsets 1, 2 ... M - 1 and do 
likewise for stratum I. 

K4.6 S trategy Six - The Ultimate Resort 

If, after attempting all the above strategies, the*e is still 
insufficient storage space, the user is advised to cry a trip 
Lb Tahiti. It may not solve the storage problems, but from a 
new perspective the user may well decide it does not matter. 

K4.7 Summary 

The situation is summarized in Table K4.1. 
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Table K4.1 Summary ^f^tratcgies for Usg-of-T^mporar^^t&ragc 



Strategy 


Indicator 
bri Problem 
Card 


Columns 


User must 

provide 

unit 


Space 
saved 


Problems 




1 


PC10=I 


21 


20 


VR-1 


none 




2 


PC11=K* 


22-26 


29 




alterations to 


VARNCE 


3 


pci:=i 


27 


21 


W-l 


alterations to 


F 


4 


PC) 3=1 


23 


19 


5H-3 


none 




S 


PC14*V* 


29-33 


22-28 


6V-6V 


alterations to 
alterations to 


W INPUT 
F 



K5 Sampletnputs^ and Outputs 

These exnnples, provided by the program's authors* are deliberately 
contrived to illustrate features of rhe program. 
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Main Deck (see K3.2.1): 

DIMENSION* A(60) ,N[60) 
DOUBLE PRECISION U(60) 
DIMENSION T(l) .SB(1) 
C0WKDN7ASAMP/NB,T 
\ READ (5 > 60) (NB(I)i 1=1.1} 
FORMAT(IOIB) 
READ(S,63) (Tri) ,1-1,1) 
FORMAT (20F4.0) 
DIMENSION WT(5,3) 

K * 0 

DO SI L=l,3 

READ(S.600) hT(l,L},KT(2,L) 
600 FORMAT(2F1.0) 

NT(3.L) * KT(l.L)**2 
KT(4.t) - l.T(2.t)**2 
St KT(5.L) « hT(l.L)*WT(2,L3 

CALL PREPAR(A,6O,N,60,D,60) 
STOP 
END 
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Subroutines: 

(a) (sec K3.2.5J 

SUBROUTINE tfiNPUT(W,N) _ 
DIMENSION WT(5,3) f K(X) 
COMrN/AW/UT,K 
K « K ♦ 1 
DO S4 J*l,5 
54 H(JJ » OT(J,K) 
RETURN 
END 

(b) (see K3.2.4) 

SUBROUTINE NSTRATfl iN,FR,NT,lV) 

DIMENSION NB(1),T(1) 

CONWON/ASAMP/NB ,T 

N - NB(I) 

FR ■ T(l) 

NT * 0 

IV * I 

RETURN 

END 

(c) (see K3.2.2) 

DOUBLE PRECISION FACTION F(:<T,?n , .XS.145,N3 t XR.XR) 
DOUBLE PRECISION Xl (NT) ,XS(MS,NS) ,XR(NR) 
F » (XT(S) - XT(irxT(2)/75.)/ 

1D3QRT(£XT(5) - XT( I) **'2/75 .) *(XT(4) - XT(2)**2/75.)) 
RETURN 
END 

(Vaiuo of f L 1 not needed as there is only 1 function). 
Data Deck: 



Card 1: Column 8*1 

Card 2: Columns 1-4 ■ 0.04 

Card 3: Col man 1 » 1 

Column 2 « 2 
Card 4: Col una 1*0 

Colunn 2*3 



(see K3.2.1 and K3.2.4) 
(see K3.2 and K3.2.4) 
(see K3.2 and »S.2.S) 

(see K3.2 And K3.2.5) 



97 



106 



/ 




ERIC 



Card 5: Column 1 ■ 2 (see K3.2 and K3.2.S) 

Column 2 * 8 

Card 6:; Column 4-1 (PCI) 

Column S » 1 (PC2) 

Column 12 * 5 (PC3) ' 

Column 13 * 2 (PC4) 

Column 16 * 2 (PC6) 

Column 35 = 2 (PC16) 
Printed Output: 

DIMENSION LIMITS ARE 33 7 7 . 

FUNCTION 1 .69337S24+00 .64704124-01 .25437009*00 .36685776*00 

Exanple & (Same problem as Example A) 

Main Deck (cee K3.2.1): 

DttENSION A(60) t N(60) 
DOUBLE PRECISION D(60) 
CALL PR£PAR(A,60,N,60,D f 60) 
STOP 
END 

Subronciixcs: 

(a) (see K4.2 and K3.2.3; note that thir is simple random 
sampxe formula) 

SUBROUTINE VARNCE (T, M , I ,N , FR, Q , INU) 
DIMENSION T(M) 
L ■ 0 
Q » 0. 
B « 0. 

ND - (N - ij/M ♦ I 
A • H 

IF (ND.EQ.l) CO TO 3 
RE AD (INU) T 

i do i 

L - L ♦ 1 



OS 



10 V 



i 



(b) 



3 
4 



5 
6 



IP (t.GT.N) GO TO 2 
S - t 

H • E ♦ T*K) - 

IF (L.EQ.l) GO TO I 

Q » Q ♦ CST(K) - B)**2/(S*(S - 10) 

CONTINUE 

Q « {I- - FR)*A*q/(A - 1.) 

RETURN 

2ND 

(see K3.2.2 and K4.4) 

DOUBLE PRECISION FUNCTION F ( XT , NT f XS , MS , NS , X R , N R) 



DOUBLE PRECISION XTCNT) ^XS^lSiNS} ,XRCNR) 
COMNJON/CONMF/ L» INP , I , INT , H 
DOUBLE PRECISION XP(6) 
J - 2*M 

IF tM.EQ-03 CO TO 1 
XPCJ-I) * XT(1) 
XP(J) * XT(2) 
IF (M.EQ.3) GO TO 4 

HP ;i » ♦ 1 , 

DO 3 K * MP, 3 
READ(INT1 XT 
XPCJ+1) * XTU) 
XP(J-2) - XT(2) 
J =» J ♦ 2 
RE;:t\D INT 
J » 0 

I? (M.ii.ij GO TO 6 

W » M - t 

DO S K-l.MP 
KEAD(INT) XT 
SP(J*1) - XT (ij 
XP(J*2) • XT(2) 
J » J ♦ 2 

F - (XP(5) - XP(lJ*XP(2)/73.}/ 

IDSQRT((XP(3) - XP(1)**:/75.)*CXP(4) - XP(2)**2/7S.)) 
RETURN 



/ 

/ 



Data Deck: 



Column 4 ■ 1 


(PCI) 


Column 8 ■ 1 


(PC2) 


fnYnmn 1 9 m C 
COiUuUl 1* " J 




Column i§ ■ 1 


(PCS) 


Column 20 » 1 


(PC9) 


Column 21 ■ 1 


(PC10) 


Column 26 * 2 


(PC11) 


Column 28 ■ 1 


(PC13) 


Column 33 ■ 2 


(PC14) 


Column 35 * 2 


(PC16) 


Columns 1-15 * 


(18, F4. 0,12, 11) (see K3.1.2) 



Card 5: Column 8 » 3 

Columns 13-14 « 75 
Column 15 » 3 
Carl 4: Columns 1-7 « (2F2.0) 
Cards 5-14: 0101 
0101 
0100 
0003 
0009 

bobo 

0208 
0464 
1600 



(see K3.1.3) 
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Printed Output: 

DIMENSION LIMITS ARE 11 4 4 

DERIVATIVES FOR FUNCTION i 



1 


3 


-.8520S026 


.02 


.27159816 


-05 


2 


3 


-,12800773 


-02 


.15624523 


-04 


3 


3 


-.69337523 


-02 


.41796269 


-05 


4 


3 


- .5353655$ 


-03 


.61S5S479 


-04 


5 


3 


.S547001S 


-02 


.14210732 


-04 



U-VALUES FOR FUNCTION 1 , STRATUM 1 
-.28017+00 -.216013*00 
.232831-07 

FUNCTION 1 .69357524*00 .64704124-01 .25437009*00 36635776*00 

Note: Internal units 19, 20, 22, 23, 24, 25, 26, 27 and 29 are 
all used. 



Example C 

Main Deck: same as for Exanple B 
Subroutines: 

(a) (see K3.2.6.) 

SUBROUTINE NSUBhV (I,J,1HV) 
IHV = MIN0(I-l,J-t) 
RETURN 
END 

(b) (sec K3.2.7J 

- SUBROUTINE NSUBFV (L,J,IFV) 
IFV - MINO (L-l, J-l) 
RETURN* 
END 

(c) (see K3.2.2J 

DOUBLE PRECISION FUNCTION F(XT,NT,XS,MS,NS,NR) 
DOUBLE PRECISION XT (NT] , XS (* 5 , N5) ,XR(NR) 
cos: »N COMMF/t 
F - X.<(2,1)/XS(1,1) 
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IF (l.EQ.2) RETURN 
F « F*XSCi.2) 
RETURN 
END 

Data Deck: 

Card i: Column 4 = 2 
Column 3=2 
Column 12 = 2 
Column 54 = 1 
Column 55 = 1 
Card 2: Columns 1-13 =* (II, Fl. 0,211} 
Card 3: Column 1=2 
Column 3*4 
Card 4: same as Card 3 
Card 5: Columns 1-7 = (2F1.0) 
Cards 7-10: 24 
26 
00 
20 

Prirtcd Output: 

DIMENSION' LIMITS ARE 19 9 6 

FUNCTION 1 .50000000*01 .130000u0*0? .3G0SS513*01 .7211102S*CO 
FUNCTION 2 .50000000*00 .12S00000*uO .35355339*00 .70710678*00 
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K6 Li-sting of the Progrnm 



SUS^CUTiMc r^t.?ARU I IA f 3 t i:i l D,n«) £5000010 

DIriEN5I0H.-An»i..NjUlhraf(i2l 00000020 

DOUBLE rRtC t I5ICfi UilliJ OOGOOO^O 

ccahOii/f-acr i Uf_ i * jyy kk;:.U ^ jU i i kk ..ks ^jskv i.tif.v ^ ;ttvi» iks oooooo«;o 

UrK 00000040 

65 -GKi^T(3U f 2{il ,1:1,3:1 t IS,2ri, 15,311! 00000070 

ha * o ooocooso 

::.3 * i oooooCtO 

iiiS'l? .__ 00000100 

if (JiiS .07, O: GO 73 6 0000^10 

SEtfiSitii.?*? ooooojio 

6i FOREST 1 OOOOCISO 

6 i.c ; ooooot^o 
ir <j.;s - n 2,3,-; ooaootso 

4 call HSf&AfCiiiiiiiKiPt.ivi ooooojso 

60 TO 5 C00^01?0 

3 Uitfiifc'Si .iU.r^.st.iv 000001 so 

GO TO 3 O0OOO11O 

2 RtnuiS.r n* j «.i t Frv f it7 f 00000200 

5 IF iJtJ .eO. O .H.iu. itf .i.7. 2> 50 75 9 OOOOy^iO 

1D3 i iiAA0(;:.3 r imj 00000220 

? i? wil_.10. Oi 50 70 7 00000230 

SS ^ ST 00000240 

B * ifrf 00000230 

rR * 000*02*0 

7 Ir <K3 0 .OR. ft* .07. 0) 00 70 I 0v0..^7O 
US77EtIii$; Nf.fS.IJ 0000)230 

s i;R « ^ ^ HU oooooiro 

Ir %K3 .07. 0 .5*. fi'rk .07. }) 00 70 \ 00000300 

AilS * rk 00^00310 

iiili = nil 00050320 

iiCl « ««) - IV 00000330 

rCS.iTt.ivt 00CO034O 

le ?KS .£3. 0 . C : R. rr* .07. v> 00 70 24 0O0O033O 

vzsViii :«.s ooooosoo 

:;;t i i 00000370 

If In3:;v7; 0! 00:70 51 00000380 

iSI ^ in 0G0O037O 

x; 12 - ; ♦ ;ti ooooo^oo 

J2 ^ 1 ♦ 000004 10 

53 * is ♦ III 0000042S 

i:.:a * ; 00000430 

;s:s * i ooooo«ho 

Ir ii:C .07. 0) 50 TO 12 OOOOOOO 

n-2u - ooooo;oy 

ll>2* i i.V 000004^0 

12 13 * 12 * llZ^U'lt 00vC04o? 

• If U:h .£0. u .0*. KK .07. m» 00 70 13 000w04?0 

ID3 » Kfi OvOOOSOO 



103 

11* 



be ;o u 



13 U * 13 + 153 

14 rCDlA a 1 

ri»;s « ; 

.KrV. 

IF !;?r? ,*Z* 1» 015 TO 15 
H* * \ 
itf • i 
DO io L»i_ i 
DC 16 J-M 

CAfcl 5S«EViL*Jii?Vi 

ir iiw .la. o .Siiu. as .eg, 2) go to is 
:r Cia ,ea. h go to i? 

iifvT *_0 

GO TO 55 
I a tsVit * 2 _ _ 
15 lr Cw?vT .EG. 2 .OR. KK .EC. 

KD?h * tin 

18 K2 = 1 ♦ ktt*»i%S5 

If .irtVV ,EG. Si GO To t? 
itVS * «vtf 

19 13 = U •> KVil 

16 * 13 ▼ ittrf 

17 * !6 t iiv.i 

jT 3 33 ' iivil - » 
Kl«2 - • 

\r t»Jr VT .EG. 0/ GO T3 20 

... Ku2 i.MC* 

20 k3 - K2 * KL2 

k:.3 = ; 

It wirVT .£5. 
__ k'u3 - 

21 KT * U3 * KI.3 

- i 

it uirvr .EG. vi CO TO 22 

I? iwf^r ,cu. 2 •Hi-:.. linv .c3. o; gg to :2 

ai:tUi8Ai_itJtiSi 
&o r^..ATi;x, ;l..i:c-.; cMiTS «*e 3r.A,;3n 

Ir i."?* ,ST. 0 .yJv. ;t ,gt. :a, 00 TO 23 
;? l $t » G ? . iii ;G?:. kT .61- Ifci.w.i* .13 



2 ;0?;. i:H .cd. Oi Cm'* TO 21 



1 



OvSOGSF* 
0' G0v320 
O0O0OS30 
OGOOC340 
0O00GS5O 
OOOOOSoO 
00000570 
OGGGO580 
0GO0O590 
OOGvOoGO 
000006*. 0 
OOOCOilO 
GG00G63G 

ooooo6;o 

G0G0G63G 
00000660 

OOOOO00O 
OOOOColO 

0OOGOZ1O 
OGO0C72G 
OGOGOZ30 
0CGGO74O 
00000730 
OOOOv/60 
00000770 

ooooorso 
ooooorio 
ooooosoo 

OGOvOoiO 

00000620 

O0vO053C 

0O0OC5-5O 

OOOOOblO 

0000^60 

00000570 

OvOOvooO 

OOOOO01O 

00000700 

000^750 

5O0O0?:o 

OO0OOT3O 
00000740 : 

oooooioo 

OOOOOUO 

L Aii5i.«iii'#.i!V« t 5-/5O01?« 

oo;oo?jo 

OOOOO'tiO 
0G001 OvO 



404 



11 



/ 



lyi^uiiiiz ge^Vh^y,::.^*^:;.:*^^ _ _V." 

iitoV.ivv.iD.Ar ,KMi5.i:r«?r.:*7, Ai,::r-j; T/Oooro:? 

oii*tia;G«^rur.^ 00001030 

iYY:,;7;ri t v:;i5/i f irrY<i;iri f ;v7{K«ri r iDi;ivMi 000010*0 

izuili nzziiK* i?:ii;« |I ^ipi l n(K^j r Ax;^3i r r r H? r «i oooo;oso 

yOJoL£ r^cis;;.;; si^s.rj,;.;,;^ ooooiooo 

CC^.W??£?^;;^ 00001070 

CGiM:o.i/CGit,;r/Cii^r;:;i:»TA t js oooo;oso 

gat* ;^.:;;3H/i;op,;;aB,:u:.,:uY,:;ir r iuU/ ooooivfo 

;:0;:2;23,23,2o, 27,25,29/ O0OO51OO 

wv* - srv - i>/i;vH ♦ 1 0000; tto 

iii?"S? 0O0O112O 

l5iTT.= 24 . G0v0ii30 

. HVR * - MVHttftfD - I) 00005140 

P>tA AD,EX:,Er, , AVAL/ I ,E5,5 .£3,1 .E3,1 .£-20/ ' 00005 53? 

IF wnU .37. 0) GO 70 AOS 00001HO 

«7M x i2MAX0(K?rJ,1) 00001 170 

seao{5,635 (rii7Ui;) l ui,^7;i> oooo;iao 

_ uRi7£U f A-;i irii7y-ns r x*i r K7«) oooomo 

S3 rOrCHAYC^o/ 0000 1300 

405 *y » o 00001210 

DO 110 l<*t v icn CO GO? 220 

2r US .EG. 0) 00 70 40V 0000523? 

*ead* xasi ;oAhf-,rftAC,;yAs 0000:240 

GO 70 AOS! * 0C0U2S0 

404 iiSA,-.? c ,fi.V% I i OOOOlCdO 

?*a: , Ty ci; * oooos:;o 

i»;ar * ivitii 00001*30 

4035 JC = 0 O0O01290 

oo nsi jri-;,iiv& • • ooootsco 

nVc * ^:; oi<;c.i,uv-^cj ooooisto 

i>0 1153 :-},.;VC 00001320 

JC * Ju t 1 . COOOtSGO 

It U .G7. 1 .0*. hfVT .EG. 0) 50 7-: 1133 OOO0534O 

a7JJ) • 0. 00001330' 

1533 SGUi 0O0O53JO 

;? i™? .£o. oi 50 70 ns; ooooisro 

EAtt-Si'SUJnVii.jC.rriW) 00001330 

Ir ilHV .EG, v) 00 i0 1133 ' O0O0139O 

I?U> » 0 O0O0UO0 

1133 Citfi'inUS 00001410 

ir.cif'vt. .to. h oo 70 oooom?o 

^neciiM oooouso 

it ii ,67. i .o?.. .;r«7 .eo. Oi go 70 nsi 00061440 

L^«7t«2 .7Ai a? OvvOUSO 

5535 Umlaut 00005 UO 

ii.vo .eo. ;i go 70 ns* 0^005470 

i«i oooonso 

:r %: .c;. 5 .g;. «uvt .20. 0*1 go to 1:34 00005490 

_ *£W5.m'. 2m7h 000*13/* 

its-; go ;: ;:*;,MSA..f 00001310 



II 

ERIC 
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114 




BO 121 JO»1,NVO 
Ir WW U 4055,4052»4053 
4053 CALL UltfrOTrd.KViU 

A3 TO 

4052 R£m.{**U> U 
_ . SO 70 404 ^ 

V 404 * Ki*0<Kvd f i*<V-JCi 

Ir •57. ( I GO TO 4060 

vKIltditttf w 
GO 70 irVOi 

4060 Do J- 1 1 liVC 

JC i JC * t 

4v6l I? {hVu .£0. 1J 50 TO { 223 

*? M 00 TO 1 22-4 

Rtwi_(*tiSoi SU __ "__ 

J 223 If tf .57. \i 50 TO 1221 
• ?2_4 SO 123 J'ljitvC 
3*?C JJ * v. 
?'0 • a t r 

Ir (IL(J) 0# 00 T*' 11 

SWf Ji___*_3U( J) » i<CJ# 

wOrtTInut 



22: 



70 12 



?r _ .cu.«_|# GO 

«RI7c t litjp I SU 
C3.i7I!i6t 

jT (fiVo.Eo.n oo 
tzvm iso 

CALL SUITCHCTmSAmHSSJ 
12 CDN7ISUc_. : 
tO 125 JD«1,HV3 
IF <3<tf .£0. || CO TO 124 
SEA0<i«0> ID 
ScS?iIKSA* SU _ v 

ir i«m .so. or go 70 124 

StADUJIT?) XT * , . 1 

124 l»G 114 i-lpXVH ; ; t 1 - r 

Zr (J .07. SV* .ftSS; JO «EO* nVO) GO TO 1270 : ^ 
. ir <IBCJI *E0, 0* 00 T0.«4^.-;^'.j. : ;- r 

Ir (IV* 2 •EO, 9 .0*. iVA* iCfc-.ir SO TO U42 — 
S*KJ> » SWf JJ/TRAC ;./-;•:■, ' - *^^> 

1142 I? iilrVT .EC. 2) GO 70 11422 
Ir U* Mi-Jit GO TO 11420 

MCJJ » S3IJI " O 

_V^_ 00 TO 11421 ■.'X-v-C: 
11420 XMX.Jl i SU(J) 
11421 
1142 



Wh>vl5?0 
OOvOlbOO 

OvvvS 610 
OOOvtOXV. 

50001430 
O0C0 i 640 
OvvvtolO 
5 000 1 6b0 

; vOOOlO/0 j 

O'vO 1 6o0 
- vvvyrfiffr 
00001700 
00001710 
. 00001720 
vyOOiroO 
00001740 - 
O0001730 
00001750 . 
00001770 
00001780 
O0O017?* 
0000)600 
00001510 
.00001820 . 
0000 i 530. 
.00001640 7 : 
00001350 - 
00001860 
/• 00001370. 

00&01S39 
. O0O01S9O 
OOOO'TOO 
0D00I7J0 
00*01720 
00001730 
00001740 ■ 
OO001T5O 
0O0O1HO 
I 0000197*- 

: ooootfaov 

O0O0177O 
00002000 
00002010 - ■ • 




ERLC 



lid 




114 CGHTImUc 

1270 IF viiVD .£Q> 1 ) GO TO I25_ 
ir <KFV7 .-Q. 01 GG 70 127 

___ yRI7EIIiu*l_X7_ _ ... .__ . 

• 27 IF CKFVT. .EC. 2 -OR. KH .£0. 01 Go TO S 25 

_ URiTEdiiFi aa 
123 Cuii Tittle 

Ir iSrVf *£0. 2 ,y.w_KH_,tS. 0) GG TS 1230 

IF CIit?5 .07. il Gu 7G 1250 

sa*«iis?i_xx 

i 2^0 If il.'SD .EG. li 00 TO 110 

re-i;;s ikss 

IF (KFv7 .EG. 0) GG TG ISO 
C*LL 5SSiTCniiii7AMii'73i 
HO CGii7IrtG£ 

I" (SftSA .07. 1) GC TO 134 

»-JiSS m 

IF iKS ,EQ. Or- GO 70 134 
_ REUiiO ih5_ - „„____ 

t34 IF uiVl .£0. 1 .OR. KFVT .EQ. 0) GO 70 i31 

RcUliJii ista 

131 IF fiCW .£0. 01 50 TO 132 

?;ESIn:« Iftr 
.32 l«0 102 t=l, »F 

1*0 

JD - 0 

w,t = F(x7 l Ki.: > Ar,KuiA t K:i;2i i /A i ?:i.3i 

A * 0. 

I'G 1 1 = 1 i;H 

IF iJ? .EG. 0> GO 7C 10;; 
IF <iiF77 .i.7. 2i GG 70 ;C02 

UrI7£i^.^i_L 

on FM?ift7{/lA;S3n5£Si-vS7;CE3 FOR Fon'uTIGii 13! 

00 70 IvO: 

i002 »U7E{o,ov4i t,I 

304 FGR;m7</tA t 2:Kt.£RiV;7l;£3 FOR FV.VCTIGU' 
10*1 IF (I .£0. ii GG 70 SC :-3 

if irFv; .eg. 2 .eg. g; go 70 ;03 

;oo3 - 0 

IF .£5. 0 iirVT .EC, 25 GO 70 1004 

IF (itvl* .07. Si 00 70 .004 

;og4 iz i2o jr-i t .?vti ■__ 

IF inVI. .EG. ; .OR. iiFV7 .EG. OS 00 70 ;:g 

REAi.ii;;7tt; at __ _ 

if (i .10. oo 70 ?:s 

___ R£;ut"i,;r; v2 . 

;:s if .eg. o .or. i.'Fvt .eo. 2i go 70 \zr 



I3,ivH t SYRATuii 15) 
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er|c 



5q5!Yg*J6 
GG002G30 
OuO020SO 
00002070 

00002030 

500020?? 
00002100 

oooo2i; o 

O0OG2120 

05002130 

00002140 

00002150 

00002160 

OGO02170 

O-OG21S0 

000021 ?0 

C00O220O 

GO0O22SO 

00002220 

000v2230 

O0OO2Z4O 

CC00235O 

000022oO 

00002270 

30002250 

0000:2?0 

000023-0 

00002310 

00002320 

00002330 

00)02340 

00001330 

OC0023oO 

00002370 

00002330 

50002370 

00002400 

00002410 

00002420 

OOG02430 

00002440 

00002430 

00002460 

00002470 

0O0O243O 

00302470 

00002300 

00002310 

OO0G252O 

00001330 

G0G02540 



/ 



5 1 



:r ihvd .£5, go 70-12? 

RtAui I'rir ) XX 

JC » JC ♦ t 

ir UirV - t> toi 15;t4 
15 CALL uS^rViLijClrV) . 

xr iirv .tQ. n gg to 3io 

CALL KSuSKViI.JC.lKV) .__ 
ir CHV .ECs 1) m0 TO 310 
I? (2fV .£3. 0) GO 70 u 

-.4 ;r.iji * 1 

i? il .G7.1J GO 70 32 
AS * I»A*3iX7i;#i) 
I? (AB XU AVAL) 00 70 31 
00 70 17 
32 YVCJi * YZ(J> 
GO TO 101 

lo IDU> > 2 

ir iKH .07. 0> 00 70 lot 
A* « SAFSUrU.JS). 
DO TO 5? 
161 Ad * DA33(XXiJ)> 
17 Ir (AS .LT. AVAU 00 70 31 
KIT * 2^cXI/A* 
K2r * EXI/I'mK 
2 17 « 0 
IrC » 0 
63 > AB/AD 
DX7 - t00.*M 

9 it =17 ♦ ; 

(17 .57. 3) GO 70 30 



si 

if iRi .C7. go 70 3 



ir UD(J)_.£a, 1) GO 70 173 
Ir UH .07. Oi GO TO \72 

mui) * muu+n.. " ... 

A?- * F 'XT f KD2 1 XPfKO i m , Ki* ! S * XX i KI'3 i 

X?(?,J> '» Xf U,Jl - 2. .51 

Ac * r <XT f , .:r.2 f X? f K?;A t r:siii;,XX f ^3) 

xfciiJi * Xrd.ji ♦ oi 

GO TO 171 

;tz xtu> * xt;jj ♦ di 

Af « riX7 t xr»2 r X?,KtlA t KDl& > AX l i:03) 

XiJJJ i"X?(JI - 2.iM 

ir KM .Zu, M Go 70 1731 

k£A5(I«7A) XT 

X7U> * X7U) - DI 

ir:* si * r(XT,Ki»2 f A?' t i:MH t r;D;p l xx l i:D3i 
xiu> * xiU*.* ;»i 

5f iftvO .£0. Si GG 70 171 



00002350 
0000*370 
00G02S<w 
O0O02590 

coot>2<>oo 

00002*10 
O0002o20 
00002*30 
OOG02o-;G 
00O0263G 
O0GG2o<50 
00002670 
- 0O3O2?oG 
000O2o?O 
00002700 
0O0O221O 
00002720 
00002*30 
00002740 
00002250 
0O0O27SO 
0O0O227O 
0O0O27SO 
0O0O279O 
00032300 
00002310 
00002320 
0O0O-43O 
0O0O28-50 
O000265O 
0O0O23aO 
00002570 
00002330 
000023?O 
00002*03 
Ov002?10 
00002920 
00002930 
00002940 
O0002!5y 
00002960 
O0O0277O 
O00O296O 
00002990 
OvOOjOOO 
00003010 
00OO2O2O 
00003030 
' 0^03040 
00003050 
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RCHPtXttTA J AT 

60 TC I71 
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